[CPU] Add tutorial docs for SGL on CPU (#8000)

2025-07-25 15:03:16 +08:00
parent af4b9bae95
commit 15d2759174
4 changed files with 207 additions and 0 deletions
--- a/docs/references/deepseek.md
+++ b/docs/references/deepseek.md
@@ -14,6 +14,7 @@ To run DeepSeek V3/R1 models, the requirements are as follows:
 | **Full precision FP8**<br>*(recommended)* | 8 x H200 |
 | | 8 x MI300X |
 | | 2 x 8 x H100/800/20 |
+| | Xeon 6980P CPU |
 | **Full precision BF16** | 2 x 8 x H200 |
 | | 2 x 8 x MI300X |
 | | 4 x 8 x H100/800/20 |
@@ -22,6 +23,7 @@ To run DeepSeek V3/R1 models, the requirements are as follows:
 | | 8 x A100/A800 |
 | **Quantized weights (int8)** | 16 x A100/800 |
 | | 32 x L40S |
+| | Xeon 6980P CPU |

 <style>
 .md-typeset__table {
@@ -61,6 +63,7 @@ Detailed commands for reference:
 - [8 x A100 (AWQ)](https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3#example-serving-with-8-a100a800-with-awq-quantization)
 - [16 x A100 (int8)](https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3#example-serving-with-16-a100a800-with-int8-quantization)
 - [32 x L40S (int8)](https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3#example-serving-with-32-l40s-with-int8-quantization)
+- [Xeon 6980P CPU](https://docs.sglang.ai/references/cpu.html#example-running-deepseek-r1)

 ### Download Weights
 If you encounter errors when starting the server, ensure the weights have finished downloading. It's recommended to download them beforehand or restart multiple times until all weights are downloaded. Please refer to [DeepSeek V3](https://huggingface.co/deepseek-ai/DeepSeek-V3-Base#61-inference-with-deepseek-infer-demo-example-only) official guide to download the weights.