From 3dc9ff3ce8bb88dcbcf2655f616bd5439f224c11 Mon Sep 17 00:00:00 2001
From: Shenggui Li <somerlee.9@gmail.com>
Date: Wed, 26 Feb 2025 11:40:47 +0800
Subject: [PATCH] [doc] fixed dpsk quant faq (#3865)

---
 docs/references/deepseek.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/references/deepseek.md b/docs/references/deepseek.md
index 883dd6119..ff3d649de 100644
--- a/docs/references/deepseek.md
+++ b/docs/references/deepseek.md
@@ -90,8 +90,8 @@ Overall, with these optimizations, we have achieved up to a 7x acceleration in o
 
 2. **Question**: How to use quantized DeepSeek models?
 
-    **Answer**: DeepSeek's MLA does not have support for quantization. You need to add the `--disable-mla` flag to run the quantized model successfully. Meanwhile, AWQ does not support BF16, so add the `--dtype half` flag if AWQ is used for quantization. One example is as follows:
+    **Answer**: AWQ does not support BF16, so add the `--dtype half` flag if AWQ is used for quantization. One example is as follows:
 
     ```bash
-    python3 -m sglang.launch_server --model cognitivecomputations/DeepSeek-R1-AWQ --tp 8 --trust-remote-code --dtype half --disable-mla
+    python3 -m sglang.launch_server --model cognitivecomputations/DeepSeek-R1-AWQ --tp 8 --trust-remote-code --dtype half
     ```