[doc] added quantization doc for dpsk (#3843)

2025-02-26 01:43:28 +08:00
parent 60524920ba
commit 06427dfab1
1 changed files with 10 additions and 2 deletions
--- a/docs/references/deepseek.md
+++ b/docs/references/deepseek.md
@@ -84,6 +84,14 @@ Overall, with these optimizations, we have achieved up to a 7x acceleration in o

 ## FAQ

-**Question**: What should I do if model loading takes too long and NCCL timeout occurs?
+1. **Question**: What should I do if model loading takes too long and NCCL timeout occurs?

-Answer: You can try to add `--dist-timeout 3600` when launching the model, this allows for 1-hour timeout.i
+    **Answer**: You can try to add `--dist-timeout 3600` when launching the model, this allows for 1-hour timeout.
+
+2. **Question**: How to use quantized DeepSeek models?
+
+    **Answer**: DeepSeek's MLA does not have support for quantization. You need to add the `--disable-mla` flag to run the quantized model successfully. Meanwhile, AWQ does not support BF16, so add the `--dtype half` flag if AWQ is used for quantization. One example is as follows:
+
+    ```bash
+    python3 -m sglang.launch_server --model cognitivecomputations/DeepSeek-R1-AWQ --tp 8 --trust-remote-code --dtype half --disable-mla
+    ```