Add patch_qwen3_5 for triton ops fused_recurrent_gated_delta_rule (#7109)

### What this PR does / why we need it?

The ops `torch_npu.npu_recurrent_gated_delta_rule` currently does not
support `ssm_state` inputs in float32 format,
we temporarily retain the _forward_core implementation with triton for
Qwen3_5

---------

Signed-off-by: pppeng <zepengliu912@qq.com>
Signed-off-by: pppeng <60355449+ppppeng@users.noreply.github.com>
This commit is contained in:
pppeng
2026-03-10 23:28:58 +08:00
committed by GitHub
parent a78a00e0b1
commit 0f289fa2a8
4 changed files with 275 additions and 0 deletions

View File

@@ -332,3 +332,14 @@
# https://github.com/vllm-project/vllm/pull/36225
# Future Plan:
# Remove this patch when vLLM merges the PR.
#
# ** 17. File: worker/patch_qwen3_5.py**
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# 1. `vllm.model_executor.models.qwen3_5.Qwen3_5GatedDeltaNet._forward_core`
# Why:
# The class Qwen3_5GatedDeltaNet reuse the `_forward_core` method of Qwen3NextGatedDeltaNet,
# but the ascendC ops of Qwen3NextGatedDeltaNet do not support ssm_state with float32 format.
# How
# patch Qwen3_5GatedDeltaNet._forward_core to use triton ops like `fused_recurrent_gated_delta_rule`.
# Future Plan:
# Remove this patch when all ops in _forward_core support both Qwen3_5 and Qwen3Next.