From fe13cd9ea548363b18f3d8d81ea1bc69e1017351 Mon Sep 17 00:00:00 2001
From: Mengqing Cao <cmq0113@163.com>
Date: Wed, 16 Apr 2025 09:37:21 +0800
Subject: [PATCH] [Doc] update faq about w8a8 (#534)

update faq about w8a8

---------

Signed-off-by: Mengqing Cao <cmq0113@163.com>
---
 docs/source/faqs.md | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/docs/source/faqs.md b/docs/source/faqs.md
index b44651e..d246aa4 100644
--- a/docs/source/faqs.md
+++ b/docs/source/faqs.md
@@ -101,3 +101,15 @@ vllm-ascend is a plugin for vllm. Basically, the version of vllm-ascend is the s
 ### 10. Does vllm-ascend support Prefill Disaggregation feature?
 
 Currently, only 1P1D is supported by vllm. For vllm-ascend, it'll be done by [this PR](https://github.com/vllm-project/vllm-ascend/pull/432). For NPND, vllm is not stable and fully supported yet. We will make it stable and supported by vllm-ascend in the future.
+
+### 11. Does vllm-ascend support quantization method?
+
+Currently, there is no quantization method supported in vllm-ascend originally. And the quantization supported is working in progress, w8a8 will firstly be supported.
+
+### 12. How to run w8a8 DeepSeek model?
+
+Currently, running on v0.7.3, we should run w8a8 with vllm + vllm-ascend + mindie-turbo. And we only need vllm + vllm-ascend when v0.8.X is released. After installing the above packages, you can follow the steps below to run w8a8 DeepSeek:
+
+1. Quantize bf16 DeepSeek, e.g. [unsloth/DeepSeek-R1-BF16](https://modelscope.cn/models/unsloth/DeepSeek-R1-BF16), with msModelSlim to get w8a8 DeepSeek. Find more details in [msModelSlim doc](https://gitee.com/ascend/msit/tree/master/msmodelslim/msmodelslim/pytorch/llm_ptq)
+2. Copy the content of `quant_model_description_w8a8_dynamic.json` into the `quantization_config` of `config.json` of the quantized model files.
+3. Reference with the quantized DeepSeek model.