[pref] qwen3_next add triton ops : fused_sigmoid_gating_delta_rule_update (#4818)
### What this PR does / why we need it?
qwen3_next add fused_sigmoid_gating_delta_rule_update op which fused
fused_gdn_gating+fused_recurrent_gated_delta_rule
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
Co-authored-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
This commit is contained in:
@@ -285,3 +285,15 @@
|
||||
# Future Plan:
|
||||
# Remove this patch when vLLM support these operators.
|
||||
#
|
||||
# ** 15. File: worker/patch_qwen3_next.py**
|
||||
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
# 1. `vllm.model_executor.models.qwen3_next.Qwen3NextGatedDeltaNet._forward_core`
|
||||
# Why:
|
||||
# triton ops fused_recurrent_gated_delta_rule and fused_gdn_gating in vLLM perform not good on NPU.
|
||||
# How:
|
||||
# add a new fused triton ops in vLLM with ascend implementation.
|
||||
# Related PR (if no, explain why):
|
||||
# https://github.com/vllm-project/vllm/pull/30860
|
||||
# Future Plan:
|
||||
# Remove this patch when vLLM support these operators.
|
||||
#
|
||||
|
||||
Reference in New Issue
Block a user