Files
xc-llm-ascend/vllm_ascend
iiiklw 87a0b7b7c7 [bugfix] adapt bugfix for norm_quant_fusion_pass to npugraph_ex (#6726)
### What this PR does / why we need it?

This PR adapts bugfixes from `norm_quant_fusion_pass` to
`graphex_norm_quant_fusion_pass` for the `npugraph_ex` backend.

The main changes are:
- Replaced `torch.ops.npu.npu_add_rms_norm` with
`torch.ops._C_ascend.npu_add_rms_norm_bias`.
- For patterns without bias, `None` is passed as the bias argument.
- For patterns with bias, the separate `add` operation for bias is
removed and the bias is passed directly to `npu_add_rms_norm_bias`. This
improves fusion.

These changes ensure consistency and correctness for RMSNorm and
quantization fusion patterns when using `npugraph_ex`.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main:
9562912cea

Signed-off-by: huyuanquan1 <huyuanquan1@huawei.com>
Co-authored-by: huyuanquan1 <huyuanquan1@huawei.com>
2026-02-13 10:10:39 +08:00
..