qwen3_next add triton ops : fused_qkvzba_split_reshape (#4788)

### What this PR does / why we need it?
add triton ops fused_qkvzba_split_reshape_cat for qwen3_next
GatedDeltaNet
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
UT 
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: ZT-AIA <1028681969@qq.com>
Signed-off-by: ZT-AIA <63220130+ZT-AIA@users.noreply.github.com>
This commit is contained in:
ZT-AIA
2025-12-18 11:31:04 +08:00
committed by GitHub
parent 07014e2101
commit 39fb9e7c83
4 changed files with 237 additions and 1 deletions

View File

@@ -272,4 +272,16 @@
# 1. make these functions as class func of RejectionSampler, create AscendRejectionSampler
# to override them, then delete the patch file `worker/patch_rejection_sampler.py`.
# 2. make these functions as costom op, then remove AscendRejectionSampler
#
#
# ** 14.File: worker/patch_qwen3_next.py**
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# 1. `vllm.model_executor.models.qwen3_next.Qwen3NextGatedDeltaNet.forward`
# Why:
# The Qwen3Next GatedDeltaNet forward cannot directly add custom operators.
# How
# Add a branch in Qwen3NextGatedDeltaNet.forward to adapt to fused_qkvzba_split_reshape_cat.
# Related PR (if no, explain why):
# https://github.com/vllm-project/vllm/pull/30863
# Future Plan:
# Remove this patch when vLLM support these operators.
#