From 93b6785d78d1225b65672979795566a9d73b6e47 Mon Sep 17 00:00:00 2001
From: Yi Zhang <1109276519@qq.com>
Date: Tue, 1 Jul 2025 16:19:19 +0800
Subject: [PATCH] add description for llama4 eagle3 (#7688)

---
 docs/references/llama4.md | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/docs/references/llama4.md b/docs/references/llama4.md
index b09a6e240..07cc2b737 100644
--- a/docs/references/llama4.md
+++ b/docs/references/llama4.md
@@ -22,6 +22,18 @@ python3 -m sglang.launch_server --model-path meta-llama/Llama-4-Scout-17B-16E-In
 - **Enable Multi-Modal**: Add `--enable-multimodal` for multi-modal capabilities.
 - **Enable Hybrid-KVCache**: Add `--hybrid-kvcache-ratio` for hybrid kv cache. Details can be seen in [this PR](https://github.com/sgl-project/sglang/pull/6563)
 
+
+### EAGLE Speculative Decoding
+**Description**: SGLang has supported Llama 4 Maverick (400B) with [EAGLE speculative decoding](https://docs.sglang.ai/backend/speculative_decoding.html#EAGLE-Decoding).
+
+**Usage**:
+Add arguments `--speculative-draft-model-path`, `--speculative-algorithm`, `--speculative-num-steps`, `--speculative-eagle-topk` and `--speculative-num-draft-tokens` to enable this feature. For example:
+```
+python3 -m sglang.launch_server --model-path meta-llama/Llama-4-Maverick-17B-128E-Instruct --speculative-algorithm EAGLE3  --speculative-draft-model-path nvidia/Llama-4-Maverick-17B-128E-Eagle3 --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 --trust-remote-code --tp 8 --context-length 1000000
+```
+
+- **Note** The Llama 4 draft model *nvidia/Llama-4-Maverick-17B-128E-Eagle3* can only recognize conversations in chat mode.
+
 ## Benchmarking Results
 
 ### Accuracy Test with `lm_eval`