[CPU] Add tutorial docs for SGL on CPU (#8000)
This commit is contained in:
@@ -14,6 +14,7 @@ To run DeepSeek V3/R1 models, the requirements are as follows:
|
||||
| **Full precision FP8**<br>*(recommended)* | 8 x H200 |
|
||||
| | 8 x MI300X |
|
||||
| | 2 x 8 x H100/800/20 |
|
||||
| | Xeon 6980P CPU |
|
||||
| **Full precision BF16** | 2 x 8 x H200 |
|
||||
| | 2 x 8 x MI300X |
|
||||
| | 4 x 8 x H100/800/20 |
|
||||
@@ -22,6 +23,7 @@ To run DeepSeek V3/R1 models, the requirements are as follows:
|
||||
| | 8 x A100/A800 |
|
||||
| **Quantized weights (int8)** | 16 x A100/800 |
|
||||
| | 32 x L40S |
|
||||
| | Xeon 6980P CPU |
|
||||
|
||||
<style>
|
||||
.md-typeset__table {
|
||||
@@ -61,6 +63,7 @@ Detailed commands for reference:
|
||||
- [8 x A100 (AWQ)](https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3#example-serving-with-8-a100a800-with-awq-quantization)
|
||||
- [16 x A100 (int8)](https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3#example-serving-with-16-a100a800-with-int8-quantization)
|
||||
- [32 x L40S (int8)](https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3#example-serving-with-32-l40s-with-int8-quantization)
|
||||
- [Xeon 6980P CPU](https://docs.sglang.ai/references/cpu.html#example-running-deepseek-r1)
|
||||
|
||||
### Download Weights
|
||||
If you encounter errors when starting the server, ensure the weights have finished downloading. It's recommended to download them beforehand or restart multiple times until all weights are downloaded. Please refer to [DeepSeek V3](https://huggingface.co/deepseek-ai/DeepSeek-V3-Base#61-inference-with-deepseek-infer-demo-example-only) official guide to download the weights.
|
||||
|
||||
Reference in New Issue
Block a user