[Graph][Fusion] Add MatmulAllReduceAddRMSNorm graph fusion for npugraph_ex. (#6006)
### What this PR does / why we need it?
This PR builds upon PR
https://github.com/vllm-project/vllm-ascend/pull/5011 and aims to
further enhance the npu_graph_ex_passes module. Based on prior work, we
have added graph optimization support for the add_rms_quant fused
operator in scenarios where a bias term is present—ensuring the fusion
pattern is correctly registered and matched into the computation graph.
This time, we performed the operator fusion of MatmulAllReduceAddRMSNorm
and added corresponding ST test cases for regression monitoring.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- vLLM version: v0.13.0
- vLLM main:
2c24bc6996
---------
Signed-off-by: cjian <2318164299@qq.com>
This commit is contained in:
@@ -235,7 +235,15 @@ class NpugraphExConfig:
|
||||
These configurations can directly impact the performance and behavior of models deployed on Ascend platforms.
|
||||
"""
|
||||
|
||||
def __init__(self, enable: bool = False, enable_static_kernel: bool = False, **kwargs):
|
||||
def __init__(
|
||||
self,
|
||||
enable: bool = False,
|
||||
enable_static_kernel: bool = False,
|
||||
fuse_norm_quant: bool = True,
|
||||
fuse_qknorm_rope: bool = True,
|
||||
fuse_allreduce_rms: bool = False,
|
||||
**kwargs,
|
||||
):
|
||||
"""
|
||||
Initialize the configuration.
|
||||
|
||||
@@ -251,10 +259,20 @@ class NpugraphExConfig:
|
||||
binary files with the corresponding shapes based on the current batch_size,
|
||||
which usually takes some time.
|
||||
Default: False
|
||||
fuse_norm_quant (bool): Whether to enable norm and quant fusion optimization.
|
||||
When set to True, the system will optimize norm and quant operations.
|
||||
Default: True
|
||||
fuse_qknorm_rope (bool): Whether to enable qknorm and rope fusion optimization.
|
||||
Default: True
|
||||
fuse_allreduce_rms (bool): Whether to enable allreduce and addrmsnorm fusion optimization.
|
||||
Default: False
|
||||
**kwargs: Additional optional parameters for forward compatibility and configuration extension.
|
||||
"""
|
||||
self.enable = enable
|
||||
self.enable_static_kernel = enable_static_kernel
|
||||
self.fuse_norm_quant = fuse_norm_quant
|
||||
self.fuse_qknorm_rope = fuse_qknorm_rope
|
||||
self.fuse_allreduce_rms = fuse_allreduce_rms
|
||||
|
||||
|
||||
class XliteGraphConfig:
|
||||
|
||||
Reference in New Issue
Block a user