[v0.18.0][kernel] Recompilation optimization triggered by triton function parameter optimization (#7647)
### What this PR does / why we need it?
Some parameters of Triton operators are unnecessarily modified with the
"constexpr" modifier. When these parameters change, recompilation is
triggered, which significantly affects the model performance. Therefore,
these parameters need to be rectified.
- vLLM version: v0.17.0
- vLLM main:
8b6325758c
Signed-off-by: HarpSealCC [844291270@qq.com](mailto:844291270@qq.com)
Signed-off-by: l30072083 <liuchengzhuo1@h-partners.com>
Co-authored-by: l30072083 <liuchengzhuo1@h-partners.com>
This commit is contained in:
@@ -26,7 +26,7 @@ _CONDITIONS = ("seq7168",)
|
||||
"IS_VARLEN": lambda args: args["cu_seqlens"] is not None,
|
||||
}
|
||||
)
|
||||
@triton.jit(do_not_specialize=["T"])
|
||||
@triton.jit(do_not_specialize=["T", "H", "Hg", "K", "V"])
|
||||
def chunk_gated_delta_rule_fwd_kernel_h_blockdim64(
|
||||
k,
|
||||
v,
|
||||
@@ -40,10 +40,10 @@ def chunk_gated_delta_rule_fwd_kernel_h_blockdim64(
|
||||
chunk_offsets,
|
||||
h_update,
|
||||
T,
|
||||
H: tl.constexpr,
|
||||
Hg: tl.constexpr,
|
||||
K: tl.constexpr,
|
||||
V: tl.constexpr,
|
||||
H,
|
||||
Hg,
|
||||
K,
|
||||
V,
|
||||
BT: tl.constexpr,
|
||||
USE_G: tl.constexpr,
|
||||
USE_INITIAL_STATE: tl.constexpr,
|
||||
|
||||
@@ -26,7 +26,6 @@ def chunk_local_cumsum_scalar_kernel(
|
||||
cu_seqlens,
|
||||
chunk_indices,
|
||||
T,
|
||||
B: tl.constexpr,
|
||||
H: tl.constexpr,
|
||||
BLOCK_T: tl.constexpr,
|
||||
REVERSE: tl.constexpr,
|
||||
@@ -103,7 +102,6 @@ def chunk_local_cumsum_scalar(
|
||||
cu_seqlens=cu_seqlens,
|
||||
chunk_indices=block_indices,
|
||||
T=T,
|
||||
B=B,
|
||||
H=H,
|
||||
BLOCK_T=OPTIM_BLOCK_SIZE,
|
||||
CHUNK_SIZE=chunk_size,
|
||||
|
||||
Reference in New Issue
Block a user