diff --git a/docs/backend/separate_reasoning.ipynb b/docs/backend/separate_reasoning.ipynb
index 6048c6642..50a91b897 100644
--- a/docs/backend/separate_reasoning.ipynb
+++ b/docs/backend/separate_reasoning.ipynb
@@ -8,11 +8,12 @@
"\n",
"SGLang supports parsing reasoning content our from \"normal\" content for reasoning models such as [DeepSeek R1](https://huggingface.co/deepseek-ai/DeepSeek-R1).\n",
"\n",
- "## Supported Models\n",
+ "## Supported Models & Parsers\n",
"\n",
- "Currently, SGLang supports the following reasoning models:\n",
- "- [DeepSeek R1 series](https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d): The reasoning content is wrapped with `` and `` tags.\n",
- "- [QwQ](https://huggingface.co/Qwen/QwQ-32B): The reasoning content is wrapped with `` and `` tags."
+ "| Model | Reasoning tags | Parser |\n",
+ "|---------|-----------------------------|------------------|\n",
+ "| [DeepSeek‑R1 series](https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d) | `` … `` | `deepseek-r1` |\n",
+ "| [Qwen3 and QwQ series](https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2e4f653967f) | `` … `` | `qwen3` |"
]
},
{
@@ -60,9 +61,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "Note that `--reasoning-parser` defines the parser used to interpret responses. Currently supported parsers include:\n",
- "\n",
- "- deepseek-r1: DeepSeek R1 series and QwQ (e.g. deepseek-ai/DeepSeek-R1, Qwen/QwQ-32B)."
+ "Note that `--reasoning-parser` defines the parser used to interpret responses."
]
},
{
diff --git a/docs/supported_models/generative_models.md b/docs/supported_models/generative_models.md
index d05834936..e07b1fc80 100644
--- a/docs/supported_models/generative_models.md
+++ b/docs/supported_models/generative_models.md
@@ -16,8 +16,8 @@ python3 -m sglang.launch_server \
| Model Family (Variants) | Example HuggingFace Identifier | Description |
|-------------------------------------|--------------------------------------------------|----------------------------------------------------------------------------------------|
-| **DeepSeek** (v1, v2, v3/R1) | `deepseek-ai/DeepSeek-R1` | Series of advanced reasoning-optimized models (including a 671B MoE) trained with reinforcement learning; top performance on complex reasoning, math, and code tasks. [SGLang provides Deepseek v3/R1 model-specific optimizations](https://docs.sglang.ai/references/deepseek)|
-| **Qwen** (3, 3MoE, 2.5, 2 series) | `Qwen/Qwen3-4B-Base`, `Qwen/Qwen3-MoE-15B-A2B` | Alibaba’s latest Qwen3 series for complex reasoning, language understanding, and generation tasks; Support for MoE variants along with previous generation 2.5, 2, etc. |
+| **DeepSeek** (v1, v2, v3/R1) | `deepseek-ai/DeepSeek-R1` | Series of advanced reasoning-optimized models (including a 671B MoE) trained with reinforcement learning; top performance on complex reasoning, math, and code tasks. [SGLang provides Deepseek v3/R1 model-specific optimizations](https://docs.sglang.ai/references/deepseek) and [Reasoning Parser](https://docs.sglang.ai/backend/separate_reasoning)|
+| **Qwen** (3, 3MoE, 2.5, 2 series) | `Qwen/Qwen3-0.6B`, `Qwen/Qwen3-30B-A3B` | Alibaba’s latest Qwen3 series for complex reasoning, language understanding, and generation tasks; Support for MoE variants along with previous generation 2.5, 2, etc. [SGLang provides Qwen3 specific reasoning parser](https://docs.sglang.ai/backend/separate_reasoning)|
| **Llama** (2, 3.x, 4 series) | `meta-llama/Llama-4-Scout-17B-16E-Instruct` | Meta’s open LLM series, spanning 7B to 400B parameters (Llama 2, 3, and new Llama 4) with well-recognized performance. [SGLang provides Llama-4 model-specific optimizations](https://docs.sglang.ai/references/llama4) |
| **Mistral** (Mixtral, NeMo, Small3) | `mistralai/Mistral-7B-Instruct-v0.2` | Open 7B LLM by Mistral AI with strong performance; extended into MoE (“Mixtral”) and NeMo Megatron variants for larger scale. |
| **Gemma** (v1, v2, v3) | `google/gemma-3-1b-it` | Google’s family of efficient multilingual models (1B–27B); Gemma 3 offers a 128K context window, and its larger (4B+) variants support vision input. |