[Quantization][Feature] Support compressed tensors moe w4a8 dynamic weight (#5889)
### What this PR does / why we need it?
While using the LLM Compressor quantization tool from the VLLM community
to generate quantized weights, the VLLM Ascend engine needs to be
adapted to support the compressed tensors quantization format.
1. Support Moe model W4A8 dynamic weight.
- vLLM version: v0.13.0
- vLLM main:
bde38c11df
---------
Signed-off-by: LHXuuu <scut_xlh@163.com>
Signed-off-by: menogrey <1299267905@qq.com>
Co-authored-by: menogrey <1299267905@qq.com>
This commit is contained in:
1
.github/workflows/misc/model_list.json
vendored
1
.github/workflows/misc/model_list.json
vendored
@@ -206,6 +206,7 @@
|
||||
"vllm-ascend/Qwen3-30B-A3B-W8A8",
|
||||
"vllm-ascend/Qwen3-30B-A3B-W8A8-Pruning",
|
||||
"vllm-ascend/Qwen3-30B-A3B-Instruct-2507-quantized.w8a8",
|
||||
"vllm-ascend/Qwen3-30B-A3B-Instruct-2507-quantized.w4a8",
|
||||
"vllm-ascend/Qwen3-32B-W4A4",
|
||||
"vllm-ascend/Qwen3-32B-W8A8",
|
||||
"vllm-ascend/Qwen3-8B",
|
||||
|
||||
Reference in New Issue
Block a user