diff --git a/benchmark/deepseek_v3/README.md b/benchmark/deepseek_v3/README.md index 0ff4e0bc4..327325d8d 100644 --- a/benchmark/deepseek_v3/README.md +++ b/benchmark/deepseek_v3/README.md @@ -167,6 +167,27 @@ python3 benchmark/gsm8k/bench_sglang.py --num-questions 1319 --host http://10.0. python3 -m sglang.bench_one_batch_server --model None --base-url http://10.0.0.1:30000 --batch-size 1 --input-len 128 --output-len 128 ``` +#### Troubleshooting + +If you see the following error: + +```bash +ValueError: Weight output_partition_size = 576 is not divisible by weight quantization block_n = 128. +``` + +edit your `config.json` and remove the `quantization_config` block. For example: + +```json +"quantization_config": { + "activation_scheme": "dynamic", + "fmt": "e4m3", + "quant_method": "fp8", + "weight_block_size": [128, 128] +}, +``` + +Removing this block typically resolves the error. For more details, see the discussion in [sgl-project/sglang#3491](https://github.com/sgl-project/sglang/issues/3491#issuecomment-2650779851). + ## DeepSeek V3 Optimization Plan https://github.com/sgl-project/sglang/issues/2591 diff --git a/docs/references/deepseek.md b/docs/references/deepseek.md index d54ec008b..8c67c2eda 100644 --- a/docs/references/deepseek.md +++ b/docs/references/deepseek.md @@ -20,6 +20,8 @@ Please refer to [the example](https://github.com/sgl-project/sglang/tree/main/be - [Serving with two H200*8 nodes and docker](https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3#example-serving-with-two-h2008-nodes-and-docker). +- [Serving with four A100*8 nodes](https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3#example-serving-with-four-a1008-nodes). + ## Optimizations ### Multi-head Latent Attention (MLA) Throughput Optimizations