add attention backend supporting matrix in the doc (#5211)

Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
2025-04-15 17:16:34 -07:00
parent 27a009bb00
commit 3efc8e2d2a
2 changed files with 40 additions and 0 deletions
--- a/docs/backend/attention_backend.md
+++ b/docs/backend/attention_backend.md
@@ -0,0 +1,39 @@
+# Attention Backend
+
+## Supporting matrix for different attention backend
+
+| **Backend**              | **Page Size > 1** | **Spec Decoding** | **MLA** | **Sliding Window** | **MultiModal** |
+|--------------------------|-------------------|-------------------|--------|--------------------|------------|
+| **FlashInfer** | ✅                | ✅                | ✅     | ✅                 | ✅ |
+| **FA3**                  | ✅                | ✅                | ✅     | ✅                 | ✅ |
+| **Triton**               | ❌                | ✅                | ✅     | ❌                 | ❌ |
+| **Torch Native**         | ❌                | ❌                | ❌     | ❌                 | ❌ |
+
+
+## User guide
+
+#### Launch command for different attention backend.
+
+- FlashInfer (Default for Non-Hopper Machines, e.g., A100, A40)
+```bash
+python3 -m sglang.launch_server --model meta-llama/Meta-Llama-3.1-8B-Instruct --attention-backend flashinfer
+python3 -m sglang.launch_server --tp 8 --model deepseek-ai/DeepSeek-V3 --attention-backend flashinfer --trust-remote-code
+```
+
+- FlashAttention 3 (Default for Hopper Machines, e.g., H100, H200, H20)
+```bash
+python3 -m sglang.launch_server --model meta-llama/Meta-Llama-3.1-8B-Instruct --attention-backend fa3
+python3 -m sglang.launch_server --tp 8 --model deepseek-ai/DeepSeek-V3 --trust-remote-code --attention-backend fa3
+```
+
+- Triton
+```bash
+python3 -m sglang.launch_server --model meta-llama/Meta-Llama-3.1-8B-Instruct --attention-backend triton
+python3 -m sglang.launch_server --tp 8 --model deepseek-ai/DeepSeek-V3 --attention-backend triton --trust-remote-code
+
+```
+
+- Torch Native
+```bash
+python3 -m sglang.launch_server --model meta-llama/Meta-Llama-3.1-8B-Instruct --attention-backend torch_native
+```
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -32,6 +32,7 @@ The core features include:
   backend/sampling_params.md
   backend/hyperparameter_tuning.md
   backend/structured_outputs_for_reasoning_models.ipynb
+   backend/attention_backend.md

 .. toctree::
   :maxdepth: 1