Support serving DeepSeek-R1-Channel-INT8 with 32 L40S. (#4418)

2025-03-17 15:03:43 +08:00
parent 0f52fb55ec
commit 75b656488a
7 changed files with 489 additions and 11 deletions
--- a/docs/references/deepseek.md
+++ b/docs/references/deepseek.md
@@ -18,6 +18,7 @@ SGLang is recognized as one of the top engines for [DeepSeek model inference](ht
 | **Quantized weights (AWQ)** | 8 x H100/800/20 |
 | | 8 x A100/A800 |
 | **Quantized weights (int8)** | 16 x A100/800 |
+| | 32 x L40S |

 <style>
 .md-typeset__table {
@@ -56,6 +57,7 @@ Detailed commands for reference:
 - [4 x 8 x A100](https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3#example-serving-with-four-a1008-nodes)
 - [8 x A100 (AWQ)](https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3#example-serving-with-8-a100a800-with-awq-quantization)
 - [16 x A100 (int8)](https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3#example-serving-with-16-a100a800-with-int8-quantization)
+- [32 x L40S (int8)](https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3#example-serving-with-32-l40s-with-int8-quantization)

 ### Download Weights