xc-llm-ascend/vllm_ascend at 87a0b7b7c7630ae112c8384ad1af11ed29d55801 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

iiiklw 87a0b7b7c7 [bugfix] adapt bugfix for norm_quant_fusion_pass to npugraph_ex (#6726 )

### What this PR does / why we need it?

This PR adapts bugfixes from `norm_quant_fusion_pass` to
`graphex_norm_quant_fusion_pass` for the `npugraph_ex` backend.

The main changes are:
- Replaced `torch.ops.npu.npu_add_rms_norm` with
`torch.ops._C_ascend.npu_add_rms_norm_bias`.
- For patterns without bias, `None` is passed as the bias argument.
- For patterns with bias, the separate `add` operation for bias is
removed and the bias is passed directly to `npu_add_rms_norm_bias`. This
improves fusion.

These changes ensure consistency and correctness for RMSNorm and
quantization fusion patterns when using `npugraph_ex`.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main:
9562912cea

Signed-off-by: huyuanquan1 <huyuanquan1@huawei.com>
Co-authored-by: huyuanquan1 <huyuanquan1@huawei.com>

2026-02-13 10:10:39 +08:00

..

[Feat] 310p support MoE W8A8 quantizaition (#6641 )

2026-02-10 17:17:44 +08:00

_cann_ops_custom

[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804 )

2025-11-28 18:06:39 +08:00

[P/D][PCP] mooncake layerwise support pcp function (#6627 )

2026-02-12 11:02:25 +08:00

[bugfix] adapt bugfix for norm_quant_fusion_pass to npugraph_ex (#6726 )

2026-02-13 10:10:39 +08:00

[P/D] layerwise connector support recompute scheduler (#5900 )

2026-02-07 15:24:42 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

device_allocator

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

[P/D][PCP] mooncake layerwise support pcp function (#6627 )

2026-02-12 11:02:25 +08:00

[EPLB] Avoiding eplb's dependency on a specified model (#6528 )

2026-02-10 15:58:44 +08:00

[main2main] upgrade vllm main 0202 (#6560 )

2026-02-05 19:31:17 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #5 ) (#5996 )

2026-01-24 22:45:38 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #6 ) (#6001 )

2026-01-24 22:08:33 +08:00

[Attention] add gpt-oss support (#5901 )

2026-02-12 10:55:34 +08:00

[Feature] adapt to uva buffer and main2main (#6657 )

2026-02-12 10:36:31 +08:00

[Model] GLM5 adaptation (#6642 )

2026-02-11 22:22:22 +08:00

[Bugfix] Update target probs to target logits in rejection sample (#6685 )

2026-02-11 21:31:40 +08:00

[Model] GLM5 adaptation (#6642 )

2026-02-11 22:22:22 +08:00

[Feature] adapt to uva buffer and main2main (#6657 )

2026-02-12 10:36:31 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10 ) (#6173 )

2026-02-06 15:35:06 +08:00

__init__.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

ascend_config.py

[npugraph_ex]enable npugraph_ex by default (#6664 )

2026-02-12 08:44:06 +08:00

ascend_forward_context.py

[Main][Ops] Make triton rope support index_selecting from cos_sin_cache (#5450 )

2026-02-11 21:20:53 +08:00

batch_invariant.py

implement batch invariant with ascendc (#6590 )

2026-02-10 14:15:26 +08:00

cpu_binding.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

envs.py

[MISC] Clean up useless env USE_OPTIMIZED_MODEL (#6618 )

2026-02-09 15:38:58 +08:00

flash_common3_context.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

meta_registration.py

[Ops][Refactor] Remove custom rotary_embedding operator (#6523 )

2026-02-07 09:24:05 +08:00

platform.py

[Feature] adapt to uva buffer and main2main (#6657 )

2026-02-12 10:36:31 +08:00

profiling_config.py

[Core][Misc] Clean up ProfileExecuteDuration (#6461 )

2026-02-01 20:06:01 +08:00

utils.py

[Feat.]: 310p support MOE models (#6530 )

2026-02-06 10:30:56 +08:00