qwen3_next add triton ops : fused_qkvzba_split_reshape (#4788)
### What this PR does / why we need it?
add triton ops fused_qkvzba_split_reshape_cat for qwen3_next
GatedDeltaNet
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
UT
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
---------
Signed-off-by: ZT-AIA <1028681969@qq.com>
Signed-off-by: ZT-AIA <63220130+ZT-AIA@users.noreply.github.com>
This commit is contained in:
@@ -272,4 +272,16 @@
|
||||
# 1. make these functions as class func of RejectionSampler, create AscendRejectionSampler
|
||||
# to override them, then delete the patch file `worker/patch_rejection_sampler.py`.
|
||||
# 2. make these functions as costom op, then remove AscendRejectionSampler
|
||||
#
|
||||
#
|
||||
# ** 14.File: worker/patch_qwen3_next.py**
|
||||
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
# 1. `vllm.model_executor.models.qwen3_next.Qwen3NextGatedDeltaNet.forward`
|
||||
# Why:
|
||||
# The Qwen3Next GatedDeltaNet forward cannot directly add custom operators.
|
||||
# How:
|
||||
# Add a branch in Qwen3NextGatedDeltaNet.forward to adapt to fused_qkvzba_split_reshape_cat.
|
||||
# Related PR (if no, explain why):
|
||||
# https://github.com/vllm-project/vllm/pull/30863
|
||||
# Future Plan:
|
||||
# Remove this patch when vLLM support these operators.
|
||||
#
|
||||
|
||||
Reference in New Issue
Block a user