xc-llm-ascend

Files

22dimensions f5a97e8fa5 [Quantization] register AscendQuantRMSNorm for quantization (#2856 )

### What this PR does / why we need it?

modelslim will generate self.bias for rms norm in quantization, since
RMSNorm in vllm has no this parameter, so its nesscesary
to create a AscendQuantRmsNorm.
### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

tested by deepseek-v3.1-w8a8

<img width="2496" height="592" alt="image"
src="https://github.com/user-attachments/assets/004c6e76-3d7a-4a1f-b59f-a14304012663"
/>


- vLLM version: main
- vLLM main:
d6249d0699

Signed-off-by: 22dimensions <waitingwind@foxmail.com>

2025-09-11 23:14:02 +08:00

__init__.py

[Core] Cherry pick from 0.7.1 to keep the main code newest (#127 )

2025-02-21 17:07:37 +08:00

quant_config.py

support qwen25 vl w8a8 quantization (#2778 )

2025-09-11 16:40:51 +08:00

utils.py

[Quantization] register AscendQuantRMSNorm for quantization (#2856 )

2025-09-11 23:14:02 +08:00

w4a8_dynamic.py

[main] [refactor] refactor common_fused_moe.py (#2706 )