[Fusion] [Graph] Add qknorm rope fusion operator (#4711)
### What this PR does / why we need it?
This PR add `qkv_rmsnorm_rope` operator and introduces a graph fusion
pass for `qknorm_rope` operations. The implementation includes a new
configuration flag, a pattern matching pass using
`torch._inductor.pattern_matcher`, and a custom Triton kernel for the
fused operation.
Co-authored-by: Angazenn
[supperccell@163.com](mailto:supperccell@163.com)
### Does this PR introduce _any_ user-facing change?
Yes, add new additional_config
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
---------
Signed-off-by: wxsIcey <1790571317@qq.com>
This commit is contained in:
@@ -210,4 +210,4 @@ def test_aclgraph_enable():
|
||||
# after check_and_update_config, mode should be VLLM_COMPILE and piecewise cudagraph
|
||||
NPUPlatform.check_and_update_config(VllmConfig)
|
||||
assert VllmConfig.compilation_config.mode == CompilationMode.VLLM_COMPILE
|
||||
assert VllmConfig.compilation_config.cudagraph_mode == CUDAGraphMode.PIECEWISE
|
||||
assert VllmConfig.compilation_config.cudagraph_mode == CUDAGraphMode.PIECEWISE
|
||||
Reference in New Issue
Block a user