xc-llm-ascend

Files

Angazenn 9fbd8017c0 [Quantization]300I Duo support w8a8 quantization (#1560 )

### What this PR does / why we need it?
This pr supports w8a8 on 300I Duo platform. The main change is to use
`npu_quant_grouped_matmul_dequant` to replace `npu_grouped_matmul`.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
offline inference on 310p runs normally.

---------

Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: tianyitang <tangtianyi4@huawei.com>
Co-authored-by: angazenn <zengyanjia@huawei.com>
Co-authored-by: tianyitang <tangtianyi4@huawei.com>

2025-07-03 22:12:46 +08:00

test_quant_config.py

Fix W8A8 fused moe bug (#1529 )

2025-07-02 16:40:51 +08:00

test_quantizer.py

Fix W8A8 fused moe bug (#1529 )

2025-07-02 16:40:51 +08:00

test_w8a8.py

[Quantization]300I Duo support w8a8 quantization (#1560 )

2025-07-03 22:12:46 +08:00