diff --git a/docs/source/user_guide/feature_guide/quantization.md b/docs/source/user_guide/feature_guide/quantization.md
index 9851cd3..be5b793 100644
--- a/docs/source/user_guide/feature_guide/quantization.md
+++ b/docs/source/user_guide/feature_guide/quantization.md
@@ -4,28 +4,58 @@
Like vLLM, we now support quantization methods such as compressed-tensors, AWQ, and GPTQ, enabling various precision configurations including W8A8, W4A16, and W8A16. These can help reduce memory consumption and accelerate inference while preserving model accuracy.
+## Support Matrix
+
+
+
+ | Compressed-Tensor (w8a8) |
+ Weight only (w4a16/w8a16) |
+
+
+ | Dynamic |
+ Static |
+ AWQ (w4a16) |
+ GPTQ (w4a16/w8a16) |
+
+
+ | Dense/MoE |
+ Dense/MoE |
+ Dense |
+ MoE |
+ Dense |
+ MoE |
+
+
+
+
+ | ✅ |
+ ✅ |
+ ✅ |
+ WIP |
+ ✅ |
+ WIP |
+
+
+
+
++ W8A8 dynamic and static quantization are now supported for all LLMs and VLMs.
++ AWQ/GPTQ quantization is supported for all dense models.
+
## Usages
### Compressed-tensor
-To run a `compressed-tensors` model with vLLM-kunlun, you should first add the below configuration to the model's `config.json`:
-
-```Bash
-"quantization_config": {
- "quant_method": "compressed-tensors"
- }
-```
-
-Then you run `Qwen/Qwen3-30B-A3B` with dynamic W8A8 quantization with the following command:
+To run a `compressed-tensors` model with vLLM-Kunlun, you can use `Qwen/Qwen3-30B-A3B-Int8` with the following command:
```Bash
python -m vllm.entrypoints.openai.api_server \
- --model Qwen/Qwen3-30B-A3B \
+ --model Qwen/Qwen3-30B-A3B-Int8 \
--quantization compressed-tensors
```
+
### AWQ
-To run an `AWQ` model with vLLM-kunlun, you can use `Qwen/Qwen3-32B-AWQ` with the following command:
+To run an `AWQ` model with vLLM-Kunlun, you can use `Qwen/Qwen3-32B-AWQ` with the following command:
```Bash
python -m vllm.entrypoints.openai.api_server \
@@ -33,9 +63,10 @@ python -m vllm.entrypoints.openai.api_server \
--quantization awq
```
+
### GPTQ
-To run a `GPTQ` model with vLLM-kunlun, you can use `Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4` with the following command:
+To run a `GPTQ` model with vLLM-Kunlun, you can use `Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4` with the following command:
```Bash
python -m vllm.entrypoints.openai.api_server \
diff --git a/docs/source/user_guide/support_matrix/supported_models.md b/docs/source/user_guide/support_matrix/supported_models.md
index fb86800..a3315bd 100644
--- a/docs/source/user_guide/support_matrix/supported_models.md
+++ b/docs/source/user_guide/support_matrix/supported_models.md
@@ -2,14 +2,14 @@
## Generative Models
-| Model | Support | W8A8 | LoRA | Tensor Parallel | Expert Parallel | Data Parallel | Piecewise Kunlun Graph |
-| :------------ | :------------ | :--- | :--- | :-------------- | :-------------- | :------------ | :--------------------- |
-| Qwen3 | ✅ | | ✅ | ✅ | | ✅ | ✅ |
-| Qwen3-Moe | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
-| Qwen3-Next | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
-
+| Model | Support | W8A8 | LoRA | Tensor Parallel | Expert Parallel | Data Parallel | Piecewise Kunlun Graph |
+| :------------ | :------ | :--- | :--- | :-------------- | :-------------- | :------------ | :--------------------- |
+| Qwen3 | ✅ | ✅ | ✅ | ✅ | | ✅ | ✅ |
+| Qwen3-Moe | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
+| Qwen3-Next | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
+| Deepseek v3.2 | ✅ | ✅ | | ✅ | | ✅ | ✅ |
## Multimodal Language Models
-| Model | Support | W8A8 | LoRA | Tensor Parallel | Expert Parallel | Data Parallel | Piecewise Kunlun Graph |
-| :----------- | :------------ | :--- | :--- | :-------------- | :-------------- | :------------ | :--------------------- |
-| Qwen3-VL | ✅ | | | ✅ | | ✅ | ✅ |
+| Model | Support | W8A8 | LoRA | Tensor Parallel | Expert Parallel | Data Parallel | Piecewise Kunlun Graph |
+| :------- | :------ | :--- | :--- | :-------------- | :-------------- | :------------ | :--------------------- |
+| Qwen3-VL | ✅ | ✅ | | ✅ | | ✅ | ✅ |