Update DeepSeek V3 Doc (#3541)
This commit is contained in:
@@ -167,6 +167,27 @@ python3 benchmark/gsm8k/bench_sglang.py --num-questions 1319 --host http://10.0.
|
|||||||
python3 -m sglang.bench_one_batch_server --model None --base-url http://10.0.0.1:30000 --batch-size 1 --input-len 128 --output-len 128
|
python3 -m sglang.bench_one_batch_server --model None --base-url http://10.0.0.1:30000 --batch-size 1 --input-len 128 --output-len 128
|
||||||
```
|
```
|
||||||
|
|
||||||
|
#### Troubleshooting
|
||||||
|
|
||||||
|
If you see the following error:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ValueError: Weight output_partition_size = 576 is not divisible by weight quantization block_n = 128.
|
||||||
|
```
|
||||||
|
|
||||||
|
edit your `config.json` and remove the `quantization_config` block. For example:
|
||||||
|
|
||||||
|
```json
|
||||||
|
"quantization_config": {
|
||||||
|
"activation_scheme": "dynamic",
|
||||||
|
"fmt": "e4m3",
|
||||||
|
"quant_method": "fp8",
|
||||||
|
"weight_block_size": [128, 128]
|
||||||
|
},
|
||||||
|
```
|
||||||
|
|
||||||
|
Removing this block typically resolves the error. For more details, see the discussion in [sgl-project/sglang#3491](https://github.com/sgl-project/sglang/issues/3491#issuecomment-2650779851).
|
||||||
|
|
||||||
## DeepSeek V3 Optimization Plan
|
## DeepSeek V3 Optimization Plan
|
||||||
|
|
||||||
https://github.com/sgl-project/sglang/issues/2591
|
https://github.com/sgl-project/sglang/issues/2591
|
||||||
|
|||||||
@@ -20,6 +20,8 @@ Please refer to [the example](https://github.com/sgl-project/sglang/tree/main/be
|
|||||||
|
|
||||||
- [Serving with two H200*8 nodes and docker](https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3#example-serving-with-two-h2008-nodes-and-docker).
|
- [Serving with two H200*8 nodes and docker](https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3#example-serving-with-two-h2008-nodes-and-docker).
|
||||||
|
|
||||||
|
- [Serving with four A100*8 nodes](https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3#example-serving-with-four-a1008-nodes).
|
||||||
|
|
||||||
## Optimizations
|
## Optimizations
|
||||||
|
|
||||||
### Multi-head Latent Attention (MLA) Throughput Optimizations
|
### Multi-head Latent Attention (MLA) Throughput Optimizations
|
||||||
|
|||||||
Reference in New Issue
Block a user