[Graph][Fusion] Add MatmulAllReduceAddRMSNorm graph fusion for npugraph_ex. (#6006)

### What this PR does / why we need it?
This PR builds upon PR
https://github.com/vllm-project/vllm-ascend/pull/5011 and aims to
further enhance the npu_graph_ex_passes module. Based on prior work, we
have added graph optimization support for the add_rms_quant fused
operator in scenarios where a bias term is present—ensuring the fusion
pattern is correctly registered and matched into the computation graph.

This time, we performed the operator fusion of MatmulAllReduceAddRMSNorm
and added corresponding ST test cases for regression monitoring.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
2c24bc6996

---------

Signed-off-by: cjian <2318164299@qq.com>
This commit is contained in:
CodeCat
2026-01-27 16:41:48 +08:00
committed by GitHub
parent 21b6779a33
commit 54e8389f8e
5 changed files with 189 additions and 4 deletions

View File

@@ -48,4 +48,18 @@ class NpuGraphEXPassManager:
def configure(self, config: VllmConfig):
# By default, we enable the graph fusion and quantization fusion pass.
self.ascend_compilation_config: dict = config.additional_config.get("ascend_compilation_config", {})
self.npugraph_ex_config: dict = config.additional_config.get("npugraph_ex_config", {})
if self.npugraph_ex_config.get("fuse_norm_quant", True):
from .npugraph_ex_passes.graphex_norm_quant_fusion_pass import GraphEXAddRMSNormFusionPass
self.passes.append(GraphEXAddRMSNormFusionPass(config))
if self.npugraph_ex_config.get("fuse_qknorm_rope", True):
from .npugraph_ex_passes.graphex_qknorm_rope_fusion_pass import GraphEXQKNormRopeFusionPass
self.passes.append(GraphEXQKNormRopeFusionPass(config))
if self.npugraph_ex_config.get("fuse_allreduce_rms", True):
from .npugraph_ex_passes.graphex_allreduce_rmsnorm_fusion_pass import GraphEXMatmulAllReduceAddRMSNormPass
self.passes.append(GraphEXMatmulAllReduceAddRMSNormPass(config))