xc-llm-ascend

Files

欧派果奶我还要 a336543977 [Bugifx] fix quant_apply_mlp w1_scale type error & fix getting num_local_expert (#4632 )

### What this PR does / why we need it?
Fix bugs introduced by
bc67696a02
1. fix getting num_local_experet error in vllm_adaptor
2. fix w1_scale type error in
moe_mlp.quant_apply_mlp.npu_dequant_swiglu_quant in w4a8 quantized
scenario

- vLLM version: v0.12.0

---------

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>
Signed-off-by: 欧派果奶我还要 <47294568+845473182@users.noreply.github.com>
Co-authored-by: 白永斌 <baiyongbin3@h-partners.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>

2025-12-05 16:04:24 +08:00

compressed_tensors

[Quantization] Support compressed tensors w8a8 static and w8a8 dynamic weight (#4036 )

2025-11-28 14:09:39 +08:00

__init__.py

[Core] Cherry pick from 0.7.1 to keep the main code newest (#127 )

2025-02-21 17:07:37 +08:00

quant_config.py

[main][bugfix] bugfix for qwen3 moe quantization (#4599 )

2025-12-01 23:48:57 +08:00

utils.py

[feature] Support W8A8 PD-Mix Quantization (#4235 )