[v0.18.0][Triton][Qwen3.5] delete expr for kernels args (#7646)

### What this PR does / why we need it? Some parameters of Triton operators are unnecessarily modified with the "constexpr" modifier. When these parameters change, recompilation is triggered, which significantly affects the model performance. Therefore, these parameters need to be rectified. backport: https://github.com/vllm-project/vllm-ascend/pull/7482 Signed-off-by: w30012745 <wangxiaoshuai2@h-partners.com> Co-authored-by: w30012745 <wangxiaoshuai2@h-partners.com>
2026-03-25 23:31:27 +08:00
parent dd55736ee4
commit dba34d4915
4 changed files with 13 additions and 13 deletions
--- a/vllm_ascend/ops/triton/fla/wy_fast.py
+++ b/vllm_ascend/ops/triton/fla/wy_fast.py
@@ -17,7 +17,7 @@ from .utils import prepare_chunk_indices


@triton.heuristics({"IS_VARLEN": lambda args: args["cu_seqlens"] is not None})
-@triton.jit(do_not_specialize=["T"])
+@triton.jit(do_not_specialize=["T", "H", "Hg", "K", "V"])
 def recompute_w_u_fwd_kernel(
    k,
    v,
@@ -29,10 +29,10 @@ def recompute_w_u_fwd_kernel(
    cu_seqlens,
    chunk_indices,
    T,
-    H: tl.constexpr,
-    Hg: tl.constexpr,
-    K: tl.constexpr,
-    V: tl.constexpr,
+    H,
+    Hg,
+    K,
+    V,
    BT: tl.constexpr,
    BK: tl.constexpr,
    BV: tl.constexpr,