add attention backend supporting matrix in the doc (#5211)

Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
2025-04-15 17:16:34 -07:00
parent 27a009bb00
commit 3efc8e2d2a
2 changed files with 40 additions and 0 deletions
--- a/docs/backend/attention_backend.md
+++ b/docs/backend/attention_backend.md
@@ -0,0 +1,39 @@
 # Attention Backend
 ## Supporting matrix for different attention backend
 | **Backend**              | **Page Size > 1** | **Spec Decoding** | **MLA** | **Sliding Window** | **MultiModal** |
 |--------------------------|-------------------|-------------------|--------|--------------------|------------|
 | **FlashInfer** | ✅                | ✅                | ✅     | ✅                 | ✅ |
 | **FA3**                  | ✅                | ✅                | ✅     | ✅                 | ✅ |
 | **Triton**               | ❌                | ✅                | ✅     | ❌                 | ❌ |
 | **Torch Native**         | ❌                | ❌                | ❌     | ❌                 | ❌ |
 ## User guide
 #### Launch command for different attention backend.
 - FlashInfer (Default for Non-Hopper Machines, e.g., A100, A40)
 ```bash
 python3 -m sglang.launch_server --model meta-llama/Meta-Llama-3.1-8B-Instruct --attention-backend flashinfer
 python3 -m sglang.launch_server --tp 8 --model deepseek-ai/DeepSeek-V3 --attention-backend flashinfer --trust-remote-code
 ```
 - FlashAttention 3 (Default for Hopper Machines, e.g., H100, H200, H20)
 ```bash
 python3 -m sglang.launch_server --model meta-llama/Meta-Llama-3.1-8B-Instruct --attention-backend fa3
 python3 -m sglang.launch_server --tp 8 --model deepseek-ai/DeepSeek-V3 --trust-remote-code --attention-backend fa3
 ```
 - Triton
 ```bash
 python3 -m sglang.launch_server --model meta-llama/Meta-Llama-3.1-8B-Instruct --attention-backend triton
 python3 -m sglang.launch_server --tp 8 --model deepseek-ai/DeepSeek-V3 --attention-backend triton --trust-remote-code
 ```
 - Torch Native
 ```bash
 python3 -m sglang.launch_server --model meta-llama/Meta-Llama-3.1-8B-Instruct --attention-backend torch_native
 ```
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -32,6 +32,7 @@ The core features include:
   backend/sampling_params.md
   backend/hyperparameter_tuning.md
   backend/structured_outputs_for_reasoning_models.ipynb
   backend/attention_backend.md
 .. toctree::
   :maxdepth: 1