Introduce W4A4 LAOS Quantization for better model compression and
inference efficiency on Ascend devices.
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Introduce W4A4 LAOS Quantization for better model compression and
inference efficiency on Ascend devices.
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
Signed-off-by: hfadzxy <starmoon_zhang@163.com>