xc-llm-ascend

Files

Slightwind 4f6d60eb06 [Feature] Add W4A4 Flat Quantization support (#3427 )

Introduce W4A4 Flat Quantization for better model compression and
inference efficiency on Ascend devices.

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>

2025-10-13 23:20:16 +08:00

test_quant_config.py

[3/N][Refactor][Quantization]remove packed_modules_mapping from models (#3021 )

2025-09-19 20:50:14 +08:00

test_utils.py

[1/N][Refactor][Quantization] remove redundant quantizer class (#2680 )

2025-09-04 11:35:14 +08:00

test_w4a4_flatquant_dynamic.py

[Feature] Add W4A4 Flat Quantization support (#3427 )

2025-10-13 23:20:16 +08:00

test_w4a8_dynamic.py

[main][quantization] Support deepseek w4a8 per-channel quantization (#3011 )