[pref] qwen3_next add triton ops : fused_sigmoid_gating_delta_rule_update (#4818)

### What this PR does / why we need it? qwen3_next add fused_sigmoid_gating_delta_rule_update op which fused fused_gdn_gating+fused_recurrent_gated_delta_rule - vLLM version: v0.12.0 - vLLM main: ad32e3e19c Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com> Co-authored-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
2025-12-19 16:34:11 +08:00
parent 118b0ed346
commit 0cc3fc357f
5 changed files with 539 additions and 1 deletions
--- a/vllm_ascend/patch/init.py
+++ b/vllm_ascend/patch/init.py
@@ -285,3 +285,15 @@
 #    Future Plan:
 #       Remove this patch when vLLM support these operators.
 #
+# ** 15. File: worker/patch_qwen3_next.py**
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+#   1. `vllm.model_executor.models.qwen3_next.Qwen3NextGatedDeltaNet._forward_core`
+#    Why:
+#       triton ops fused_recurrent_gated_delta_rule and fused_gdn_gating in vLLM perform not good on NPU.
+#    How：
+#       add a new fused triton ops in vLLM with ascend implementation.
+#    Related PR (if no, explain why):
+#       https://github.com/vllm-project/vllm/pull/30860
+#    Future Plan:
+#       Remove this patch when vLLM support these operators.
+#