xc-llm-ascend

Files

rjg-lyh 9a3bdf2162 [main] Use AddRmsNormQuant ops in the custom model to optimize Qwen3's performance (#1806 )

### What this PR does / why we need it?
Optimizes the performance of the Qwen3 quantization model by registering
a custom model and adding the AddRmsNormQuant operation. Subsequent PRs
will focus on performance optimizations based on this custom model.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
CI passed with existing test.

- vLLM version: v0.9.2
- vLLM main:
8d0a01a5f2

Signed-off-by: rjg-lyh <1318825571@qq.com>

2025-07-22 19:03:13 +08:00

__init__.py

[Core] Cherry pick from 0.7.1 to keep the main code newest (#127 )

2025-02-21 17:07:37 +08:00

func_wrapper.py

[quantization] Support w8a8 quantization (#580 )

2025-04-20 18:14:05 +08:00

quant_config.py

support pangumoe w8a8c8 and docs (#1477 )

2025-06-28 18:51:07 +08:00

quantizer.py

Fix W8A8 fused moe bug (#1529 )