[Quantization] Support compressed tensors moe w8a8 int8 dynamic weight (#5718)

### What this PR does / why we need it? While using the LLM Compressor quantization tool from the VLLM community to generate quantized weights, the VLLM Ascend engine needs to be adapted to support the compressed tensors quantization format. 1. Support Moe model W8A8 Int8 dynamic weight. 2. Specify W4A16 quantization configuration. Co-authored-by: menogrey 1299267905@qq.com Co-authored-by: kunpengW-code 1289706727@qq.com ### Does this PR introduce _any_ user-facing change? No - vLLM version: v0.13.0 - vLLM main: 2f4e6548ef --------- Signed-off-by: LHXuuu <scut_xlh@163.com> Signed-off-by: menogrey <1299267905@qq.com> Signed-off-by: Wang Kunpeng <1289706727@qq.com> Co-authored-by: menogrey <1299267905@qq.com> Co-authored-by: Wang Kunpeng <1289706727@qq.com>
2026-01-14 09:17:26 +08:00
parent ecf2fa482e
commit 0415e694cd
5 changed files with 192 additions and 43 deletions
--- a/docs/source/user_guide/feature_guide/quantization.md
+++ b/docs/source/user_guide/feature_guide/quantization.md
@@ -72,7 +72,11 @@ pip install llmcompressor

 #### Model Quantization

-`LLM-Compressor` provides various quantization scheme examples. To generate W8A8 dynamic quantized weights:
+`LLM-Compressor` provides various quantization scheme examples.
+
+##### Dense Quantization
+
+An example to generate W8A8 dynamic quantized weights for dense model:

 ```bash
 # Navigate to LLM-Compressor examples directory
@@ -82,6 +86,18 @@ cd examples/quantization/llm-compressor
 python3 w8a8_int8_dynamic.py
 ```

+##### MoE Quantization
+
+An example to generate W8A8 dynamic quantized weights for MoE model:
+
+```bash
+# Navigate to LLM-Compressor examples directory
+cd examples/quantization/llm-compressor
+
+# Run quantization script
+python3 w8a8_int8_dynamic_moe.py
+```
+
 For more content, refer to the [official examples](https://github.com/vllm-project/llm-compressor/tree/main/examples).

 Currently supported quantization types by LLM-Compressor: `W8A8` and `W8A8_DYNAMIC`.