[2/3] fix dsv3 awq issue (#4625)

Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com> Co-authored-by: laixinn <xielx@shanghaitech.edu.cn>
2025-04-04 08:36:39 +08:00
parent e53bf190bc
commit d95269f9b3
8 changed files with 1139 additions and 42 deletions
--- a/benchmark/deepseek_v3/README.md
+++ b/benchmark/deepseek_v3/README.md
@@ -178,10 +178,11 @@ python3 -m sglang.bench_one_batch_server --model None --base-url http://10.0.0.1

 ### Example: Serving with 8 A100/A800 with AWQ Quantization

-AWQ does not support BF16, so add the `--dtype half` flag if AWQ is used for quantization. One example is as follows:
+Add `--quantization moe_wna16` flag to enable moe wna16 kernel for better performance.
+One example is as follows:

 ```bash
-python3 -m sglang.launch_server --model cognitivecomputations/DeepSeek-R1-AWQ --tp 8 --trust-remote-code --dtype half
+python3 -m sglang.launch_server --model cognitivecomputations/DeepSeek-R1-AWQ --tp 8 --trust-remote-code --quantization moe_wna16
 ```