sglang/docs/index.rst

SGLang Documentation
====================

SGLang is a fast serving framework for large language models and vision language models.
It makes your interaction with models faster and more controllable by co-designing the backend runtime and frontend language.
The core features include:

- **Fast Backend Runtime**: Provides efficient serving with RadixAttention for prefix caching, zero-overhead CPU scheduler, continuous batching, token attention (paged attention), speculative decoding, tensor parallelism, chunked prefill, structured outputs, quantization (FP8/INT4/AWQ/GPTQ), and multi-lora batching.
- **Flexible Frontend Language**: Offers an intuitive interface for programming LLM applications, including chained generation calls, advanced prompting, control flow, multi-modal inputs, parallelism, and external interactions.
- **Extensive Model Support**: Supports a wide range of generative models (Llama, Gemma, Mistral, Qwen, DeepSeek, LLaVA, etc.), embedding models (e5-mistral, gte, mcdse) and reward models (Skywork), with easy extensibility for integrating new models.
- **Active Community**: SGLang is open-source and backed by an active community with industry adoption.

.. toctree::
   :maxdepth: 1
   :caption: Installation

   start/install.md

.. toctree::
   :maxdepth: 1
   :caption: Backend Tutorial

   references/deepseek
   references/llama4
   backend/send_request.ipynb
   backend/openai_api_completions.ipynb
   backend/openai_api_vision.ipynb
   backend/openai_api_embeddings.ipynb
   backend/native_api.ipynb
   backend/offline_engine_api.ipynb

.. toctree::
   :maxdepth: 1
   :caption: Advanced Backend Configurations

   backend/server_arguments.md
   backend/sampling_params.md
   backend/hyperparameter_tuning.md
   backend/attention_backend.md

.. toctree::
   :maxdepth: 1
   :caption: Supported Models

   supported_models/generative_models.md
   supported_models/multimodal_language_models.md
   supported_models/embedding_models.md
   supported_models/reward_models.md
   supported_models/support_new_models.md

.. toctree::
   :maxdepth: 1
   :caption: Advanced Features

   backend/speculative_decoding.ipynb
   backend/structured_outputs.ipynb
   backend/function_calling.ipynb
   backend/separate_reasoning.ipynb
   backend/structured_outputs_for_reasoning_models.ipynb
   backend/custom_chat_template.md
   backend/quantization.md
   backend/lora.ipynb
   backend/pd_disaggregation.md

.. toctree::
   :maxdepth: 1
   :caption: Frontend Tutorial

   frontend/frontend.ipynb
   frontend/choices_methods.md

.. toctree::
   :maxdepth: 1
   :caption: SGLang Router

   router/router.md

.. toctree::
      :maxdepth: 1
      :caption: References

      references/general
      references/hardware
      references/advanced_deploy
      references/performance_analysis_and_optimization
      references/developer
[Docs] Improve documentations (#1368) 2024-09-09 20:48:28 -07:00			`SGLang Documentation`
[Docs]: Fix Multi-User Port Allocation Conflicts (#3601) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: simveit <simp.veitner@gmail.com> 2025-02-19 19:15:44 +00:00			`====================`
docs: init readthedocs support (#783) 2024-07-28 16:50:31 +10:00
[Docs] Improve documentations (#1368) 2024-09-09 20:48:28 -07:00			`SGLang is a fast serving framework for large language models and vision language models.`
			`It makes your interaction with models faster and more controllable by co-designing the backend runtime and frontend language.`
			`The core features include:`
docs: init readthedocs support (#783) 2024-07-28 16:50:31 +10:00
Revert "fix some typos" (#6244) 2025-05-12 12:53:26 -07:00			`- Fast Backend Runtime: Provides efficient serving with RadixAttention for prefix caching, zero-overhead CPU scheduler, continuous batching, token attention (paged attention), speculative decoding, tensor parallelism, chunked prefill, structured outputs, quantization (FP8/INT4/AWQ/GPTQ), and multi-lora batching.`
[Docs] Improve documentations (#1368) 2024-09-09 20:48:28 -07:00			`- Flexible Frontend Language: Offers an intuitive interface for programming LLM applications, including chained generation calls, advanced prompting, control flow, multi-modal inputs, parallelism, and external interactions.`
docs: Fix Qwen model typo (#5944) Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com> 2025-05-02 01:23:00 +08:00			`- Extensive Model Support: Supports a wide range of generative models (Llama, Gemma, Mistral, Qwen, DeepSeek, LLaVA, etc.), embedding models (e5-mistral, gte, mcdse) and reward models (Skywork), with easy extensibility for integrating new models.`
Better unit tests for adding a new model (#1488) 2024-09-22 01:50:37 -07:00			`- Active Community: SGLang is open-source and backed by an active community with industry adoption.`
docs: init readthedocs support (#783) 2024-07-28 16:50:31 +10:00
Adding Documentation for installation (#1300) Co-authored-by: zhaochen20 <zhaochenyang20@gmail.com> 2024-09-10 10:09:13 +08:00			`.. toctree::`
			`:maxdepth: 1`
Docs: Fix layout with sub-section (#3710) 2025-02-19 15:44:30 -08:00			`:caption: Installation`
Adding Documentation for installation (#1300) Co-authored-by: zhaochen20 <zhaochenyang20@gmail.com> 2024-09-10 10:09:13 +08:00
Fix doc links (#1882) 2024-11-01 20:42:30 -07:00			`start/install.md`
Add support for ipynb (#1786) 2024-10-25 20:48:35 -07:00
			`.. toctree::`
			`:maxdepth: 1`
			`:caption: Backend Tutorial`
Update ci workflows (#1804) 2024-10-26 04:32:36 -07:00
update doc (#4299) 2025-03-11 01:14:16 -07:00			`references/deepseek`
[Minor] fix documentations (#5756) 2025-04-26 17:48:43 -07:00			`references/llama4`
[CI] Improve Docs CI Efficiency (#3587) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> 2025-02-15 03:57:00 +00:00			`backend/send_request.ipynb`
Improve docs and fix the broken links (#1875) 2024-11-01 17:47:44 -07:00			`backend/openai_api_completions.ipynb`
			`backend/openai_api_vision.ipynb`
Fix docs (#1889) 2024-11-02 11:46:00 -07:00			`backend/openai_api_embeddings.ipynb`
add native api docs (#1883) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-11-02 00:17:30 -07:00			`backend/native_api.ipynb`
Add engine api (#1894) 2024-11-02 22:03:38 -07:00			`backend/offline_engine_api.ipynb`
Fix some issues with current docs. (#6588) 2025-05-25 19:04:34 +02:00
			`.. toctree::`
			`:maxdepth: 1`
			`:caption: Advanced Backend Configurations`

Docs: Fix layout with sub-section (#3710) 2025-02-19 15:44:30 -08:00			`backend/server_arguments.md`
			`backend/sampling_params.md`
			`backend/hyperparameter_tuning.md`
add attention backend supporting matrix in the doc (#5211) Co-authored-by: Stefan He <hebiaobuaa@gmail.com> 2025-04-15 17:16:34 -07:00			`backend/attention_backend.md`
Docs: Fix layout with sub-section (#3710) 2025-02-19 15:44:30 -08:00
[Docs] Supported Model Docs - Major restructuring (#5290) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> 2025-04-11 21:47:47 +05:30			`.. toctree::`
			`:maxdepth: 1`
			`:caption: Supported Models`

			`supported_models/generative_models.md`
doc: update developer guide regarding mllms (#6138) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: XinyuanTong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: Xinyuan Tong <justinning0323@outlook.com> 2025-05-14 23:13:13 +08:00			`supported_models/multimodal_language_models.md`
[Docs] Supported Model Docs - Major restructuring (#5290) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> 2025-04-11 21:47:47 +05:30			`supported_models/embedding_models.md`
			`supported_models/reward_models.md`
			`supported_models/support_new_models.md`

Docs: Fix layout with sub-section (#3710) 2025-02-19 15:44:30 -08:00			`.. toctree::`
			`:maxdepth: 1`
			`:caption: Advanced Features`

Doc: Add Docs about EAGLE speculative decoding (#3144) Co-authored-by: Chayenne <zhaochenyang@ucla.edu> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> 2025-01-26 19:49:13 -06:00			`backend/speculative_decoding.ipynb`
Docs: Fix layout with sub-section (#3710) 2025-02-19 15:44:30 -08:00			`backend/structured_outputs.ipynb`
[Docs]: Add function calling in index.rst (#3155) 2025-01-26 11:11:27 -08:00			`backend/function_calling.ipynb`
Reasoning parser (#4000) Co-authored-by: Lucas Pickup <lupickup@microsoft.com> 2025-03-04 13:16:36 +08:00			`backend/separate_reasoning.ipynb`
Doc: fix problems of the 'Execute Notebooks / run-all-notebooks' ci caused by the unstability of deepseek-ai/DeepSeek-R1-Distill-Qwen-7B (#5503) 2025-04-18 02:37:43 +08:00			`backend/structured_outputs_for_reasoning_models.ipynb`
Docs: Fix layout with sub-section (#3710) 2025-02-19 15:44:30 -08:00			`backend/custom_chat_template.md`
			`backend/quantization.md`
Add document for LoRA serving (#5521) 2025-04-20 14:37:57 -07:00			`backend/lora.ipynb`
[PD] Add doc and simplify sender.send (#6019) 2025-05-21 21:22:21 -07:00			`backend/pd_disaggregation.md`
Add support for ipynb (#1786) 2024-10-25 20:48:35 -07:00
			`.. toctree::`
			`:maxdepth: 1`
			`:caption: Frontend Tutorial`
Update ci workflows (#1804) 2024-10-26 04:32:36 -07:00
Docs: Implemented frontend docs (#3791) Co-authored-by: Chayenne <zhaochen20@outlook.com> 2025-02-27 00:30:05 +01:00			`frontend/frontend.ipynb`
[Doc] fix docs (#1949) 2024-11-07 18:20:41 -08:00			`frontend/choices_methods.md`
Adding Documentation for installation (#1300) Co-authored-by: zhaochen20 <zhaochenyang20@gmail.com> 2024-09-10 10:09:13 +08:00
Add Docs For SGLang Native Router (#2308) 2024-12-04 15:41:22 -08:00			`.. toctree::`
			`:maxdepth: 1`
			`:caption: SGLang Router`

			`router/router.md`

[Docs]: Fix Multi-User Port Allocation Conflicts (#3601) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: simveit <simp.veitner@gmail.com> 2025-02-19 19:15:44 +00:00			`.. toctree::`
Docs: Fix layout with sub-section (#3710) 2025-02-19 15:44:30 -08:00			`:maxdepth: 1`
			`:caption: References`
[Docs]: Fix Multi-User Port Allocation Conflicts (#3601) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: simveit <simp.veitner@gmail.com> 2025-02-19 19:15:44 +00:00
Docs: Fix layout with sub-section (#3710) 2025-02-19 15:44:30 -08:00			`references/general`
			`references/hardware`
			`references/advanced_deploy`
Fix some issues with current docs. (#6588) 2025-05-25 19:04:34 +02:00			`references/performance_analysis_and_optimization`
update toc for doc and dockerfile code style format (#6450) Co-authored-by: Chayenne <zhaochen20@outlook.com> 2025-05-27 13:05:11 +08:00			`references/developer`