xc-llm-ascend

Author SHA1 Message Date

Author	SHA1	Message	Date
22dimensions	f5a97e8fa5	[Quantization] register AscendQuantRMSNorm for quantization (#2856 ) ### What this PR does / why we need it? modelslim will generate self.bias for rms norm in quantization, since RMSNorm in vllm has no this parameter, so its nesscesary to create a AscendQuantRmsNorm. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? tested by deepseek-v3.1-w8a8 <img width="2496" height="592" alt="image" src="https://github.com/user-attachments/assets/004c6e76-3d7a-4a1f-b59f-a14304012663" /> - vLLM version: main - vLLM main: `d6249d0699` Signed-off-by: 22dimensions <waitingwind@foxmail.com>	2025-09-11 23:14:02 +08:00
22dimensions	d51694a77b	[2/N][Refactor][Quantization] clean quantization patch (#2785 ) ### What this PR does / why we need it? quantization patch is unused code ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? tested by CI - vLLM version: v0.10.1.1 - vLLM main: `f4962a6d55` Signed-off-by: 22dimensions <waitingwind@foxmail.com>	2025-09-08 17:31:53 +08:00
22dimensions	37f5a29cd4	[1/N][Refactor][Quantization] remove redundant quantizer class (#2680 ) ### What this PR does / why we need it? AscendQuantizer/LLMQuantizer class is used to select quant method based on quant config and some other arguments, but it is more simple and clean replacing these classes with map. So i remove them. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? ut and e2e test - vLLM version: v0.10.1.1 - vLLM main: `6997a25ac6` Signed-off-by: 22dimensions <waitingwind@foxmail.com>	2025-09-04 11:35:14 +08:00

22dimensions

f5a97e8fa5

[Quantization] register AscendQuantRMSNorm for quantization (#2856 )

### What this PR does / why we need it?

modelslim will generate self.bias for rms norm in quantization, since
RMSNorm in vllm has no this parameter, so its nesscesary
to create a AscendQuantRmsNorm.
### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

tested by deepseek-v3.1-w8a8

<img width="2496" height="592" alt="image"
src="https://github.com/user-attachments/assets/004c6e76-3d7a-4a1f-b59f-a14304012663"
/>


- vLLM version: main
- vLLM main:
d6249d0699

Signed-off-by: 22dimensions <waitingwind@foxmail.com>

2025-09-11 23:14:02 +08:00

22dimensions

d51694a77b

[2/N][Refactor][Quantization] clean quantization patch (#2785 )

### What this PR does / why we need it?
quantization patch is unused code

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
tested by CI

- vLLM version: v0.10.1.1
- vLLM main:
f4962a6d55

Signed-off-by: 22dimensions <waitingwind@foxmail.com>

2025-09-08 17:31:53 +08:00

22dimensions

37f5a29cd4

[1/N][Refactor][Quantization] remove redundant quantizer class (#2680 )

### What this PR does / why we need it?

AscendQuantizer/LLMQuantizer class is used to select quant method based
on quant config and some other arguments,
but it is more simple and clean replacing these classes with map. So i
remove them.

### Does this PR introduce _any_ user-facing change?
No 

### How was this patch tested?

ut and e2e test


- vLLM version: v0.10.1.1
- vLLM main:
6997a25ac6

Signed-off-by: 22dimensions <waitingwind@foxmail.com>

2025-09-04 11:35:14 +08:00

3 Commits