xc-llm-ascend

Files

huangdong2022 23db56a340 [Feat]Qwen3 Moe supports npu_add_rms_norm_quant op by default, update op with norm bias (#3205 )

### What this PR does / why we need it?
1. qwen3 moe uses add_rms_norm_quant op instead of 'add_rms_norm op and
quant op' during quantization scene.
2. torch_npu.add_rms_norm_quant op fixed accuracy while model weights is
quantized by anti_method m4, m4 quantization is asymmetric outlier
suppression method, it will generate none-zero norm bias,
add_rms_norm_quant op updated to add this parameter to calculate.

### Does this PR introduce _any_ user-facing change?
please use a torch_npu version >= torch_npu-2.7.1.dev20250919

### How was this patch tested?
1. no special parameters to set, no new envs to set.
2. use qwen3 moe quantization model to test ,such as
Qwen3-235B-A22B-W8A8, Qwen3-30B-A3B-W8A8,
Qwen3-235B-A22B-Instruct-2507-m4 (anti_method m4)

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: huangdong2022 <huangdong51@huawei.com>
Signed-off-by: h30027576 <huangdong51@huawei.com>

2025-10-09 20:18:10 +08:00

e2e

[Misc] Clean up useless patch (#3320 )

2025-10-09 14:07:26 +08:00

[Feat]Qwen3 Moe supports npu_add_rms_norm_quant op by default, update op with norm bias (#3205 )

2025-10-09 20:18:10 +08:00

__init__.py

[SpecDecode] Add spec decode support (#500 )

2025-04-17 20:16:32 +08:00