From ba066ca02f81579ebf70015c9e629c29b35eac22 Mon Sep 17 00:00:00 2001 From: geray <48796550+gerayking@users.noreply.github.com> Date: Tue, 9 Sep 2025 11:09:50 +0800 Subject: [PATCH] Update link for EAGLE speculative decoding (#10191) --- docs/basic_usage/deepseek.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/basic_usage/deepseek.md b/docs/basic_usage/deepseek.md index b4eaf7e0e..8a71696f5 100644 --- a/docs/basic_usage/deepseek.md +++ b/docs/basic_usage/deepseek.md @@ -153,7 +153,7 @@ python3 -m sglang.compile_deep_gemm --model deepseek-ai/DeepSeek-V3 --tp 8 --tru The precompilation process typically takes around 10 minutes to complete. ### Multi-token Prediction -**Description**: SGLang implements DeepSeek V3 Multi-Token Prediction (MTP) based on [EAGLE speculative decoding](https://docs.sglang.ai/backend/speculative_decoding.html#EAGLE-Decoding). With this optimization, the decoding speed can be improved by **1.8x** for batch size 1 and **1.5x** for batch size 32 respectively on H200 TP8 setting. +**Description**: SGLang implements DeepSeek V3 Multi-Token Prediction (MTP) based on [EAGLE speculative decoding](https://docs.sglang.ai/advanced_features/speculative_decoding.html#EAGLE-Decoding). With this optimization, the decoding speed can be improved by **1.8x** for batch size 1 and **1.5x** for batch size 32 respectively on H200 TP8 setting. **Usage**: Add arguments `--speculative-algorithm`, `--speculative-num-steps`, `--speculative-eagle-topk` and `--speculative-num-draft-tokens` to enable this feature. For example: