Slightwind
|
4f6d60eb06
|
[Feature] Add W4A4 Flat Quantization support (#3427)
Introduce W4A4 Flat Quantization for better model compression and
inference efficiency on Ascend devices.
- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0
---------
Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
|
2025-10-13 23:20:16 +08:00 |
|