sglang/docs/index.rst

SGLang Documentation
====================

SGLang is a fast serving framework for large language models and vision language models.
It makes your interaction with models faster and more controllable by co-designing the backend runtime and frontend language.
The core features include:

- **Fast Backend Runtime**: Provides efficient serving with RadixAttention for prefix caching, zero-overhead CPU scheduler, continuous batching, token attention (paged attention), speculative decoding, tensor parallelism, chunked prefill, structured outputs, and quantization (FP8/INT4/AWQ/GPTQ).
- **Flexible Frontend Language**: Offers an intuitive interface for programming LLM applications, including chained generation calls, advanced prompting, control flow, multi-modal inputs, parallelism, and external interactions.
- **Extensive Model Support**: Supports a wide range of generative models (Llama, Gemma, Mistral, QWen, DeepSeek, LLaVA, etc.), embedding models (e5-mistral, gte, mcdse) and reward models (Skywork), with easy extensibility for integrating new models.
- **Active Community**: SGLang is open-source and backed by an active community with industry adoption.

.. toctree::
   :maxdepth: 1
   :caption: Installation

   start/install.md

.. toctree::
   :maxdepth: 1
   :caption: Backend Tutorial

   references/llama4
   references/deepseek
   backend/send_request.ipynb
   backend/openai_api_completions.ipynb
   backend/openai_api_vision.ipynb
   backend/openai_api_embeddings.ipynb
   backend/native_api.ipynb
   backend/offline_engine_api.ipynb
   backend/server_arguments.md
   backend/sampling_params.md
   backend/hyperparameter_tuning.md
   backend/structured_outputs_for_reasoning_models.ipynb

.. toctree::
   :maxdepth: 1
   :caption: Advanced Features

   backend/speculative_decoding.ipynb
   backend/structured_outputs.ipynb
   backend/function_calling.ipynb
   backend/separate_reasoning.ipynb
   backend/custom_chat_template.md
   backend/quantization.md

.. toctree::
   :maxdepth: 1
   :caption: Frontend Tutorial

   frontend/frontend.ipynb
   frontend/choices_methods.md

.. toctree::
   :maxdepth: 1
   :caption: SGLang Router

   router/router.md

.. toctree::
      :maxdepth: 1
      :caption: References

      references/general
      references/hardware
      references/advanced_deploy
      references/performance_tuning
[Docs] Improve documentations (#1368) 2024-09-09 20:48:28 -07:00			`SGLang Documentation`
[Docs]: Fix Multi-User Port Allocation Conflicts (#3601) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: simveit <simp.veitner@gmail.com> 2025-02-19 19:15:44 +00:00			`====================`
docs: init readthedocs support (#783) 2024-07-28 16:50:31 +10:00
[Docs] Improve documentations (#1368) 2024-09-09 20:48:28 -07:00			`SGLang is a fast serving framework for large language models and vision language models.`
			`It makes your interaction with models faster and more controllable by co-designing the backend runtime and frontend language.`
			`The core features include:`
docs: init readthedocs support (#783) 2024-07-28 16:50:31 +10:00
Update readme (#4517) 2025-03-17 08:22:42 -07:00			`- Fast Backend Runtime: Provides efficient serving with RadixAttention for prefix caching, zero-overhead CPU scheduler, continuous batching, token attention (paged attention), speculative decoding, tensor parallelism, chunked prefill, structured outputs, and quantization (FP8/INT4/AWQ/GPTQ).`
[Docs] Improve documentations (#1368) 2024-09-09 20:48:28 -07:00			`- Flexible Frontend Language: Offers an intuitive interface for programming LLM applications, including chained generation calls, advanced prompting, control flow, multi-modal inputs, parallelism, and external interactions.`
Update readme (#4517) 2025-03-17 08:22:42 -07:00			`- Extensive Model Support: Supports a wide range of generative models (Llama, Gemma, Mistral, QWen, DeepSeek, LLaVA, etc.), embedding models (e5-mistral, gte, mcdse) and reward models (Skywork), with easy extensibility for integrating new models.`
Better unit tests for adding a new model (#1488) 2024-09-22 01:50:37 -07:00			`- Active Community: SGLang is open-source and backed by an active community with industry adoption.`
docs: init readthedocs support (#783) 2024-07-28 16:50:31 +10:00
Adding Documentation for installation (#1300) Co-authored-by: zhaochen20 <zhaochenyang20@gmail.com> 2024-09-10 10:09:13 +08:00			`.. toctree::`
			`:maxdepth: 1`
Docs: Fix layout with sub-section (#3710) 2025-02-19 15:44:30 -08:00			`:caption: Installation`
Adding Documentation for installation (#1300) Co-authored-by: zhaochen20 <zhaochenyang20@gmail.com> 2024-09-10 10:09:13 +08:00
Fix doc links (#1882) 2024-11-01 20:42:30 -07:00			`start/install.md`
Add support for ipynb (#1786) 2024-10-25 20:48:35 -07:00
			`.. toctree::`
			`:maxdepth: 1`
			`:caption: Backend Tutorial`
Update ci workflows (#1804) 2024-10-26 04:32:36 -07:00
Add Llama4 user guide (#5133) Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com> 2025-04-08 10:09:34 +08:00			`references/llama4`
update doc (#4299) 2025-03-11 01:14:16 -07:00			`references/deepseek`
[CI] Improve Docs CI Efficiency (#3587) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> 2025-02-15 03:57:00 +00:00			`backend/send_request.ipynb`
Improve docs and fix the broken links (#1875) 2024-11-01 17:47:44 -07:00			`backend/openai_api_completions.ipynb`
			`backend/openai_api_vision.ipynb`
Fix docs (#1889) 2024-11-02 11:46:00 -07:00			`backend/openai_api_embeddings.ipynb`
add native api docs (#1883) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu> 2024-11-02 00:17:30 -07:00			`backend/native_api.ipynb`
Add engine api (#1894) 2024-11-02 22:03:38 -07:00			`backend/offline_engine_api.ipynb`
Docs: Fix layout with sub-section (#3710) 2025-02-19 15:44:30 -08:00			`backend/server_arguments.md`
			`backend/sampling_params.md`
			`backend/hyperparameter_tuning.md`
feat: disable grammar restrictions within reasoning sections (#4984) Co-authored-by: tianhaoyu <thy@mail.ecust.edu.cn> Co-authored-by: DarkSharpness <2040703891@qq.com> 2025-04-08 12:46:47 +08:00			`backend/structured_outputs_for_reasoning_models.ipynb`
Docs: Fix layout with sub-section (#3710) 2025-02-19 15:44:30 -08:00
			`.. toctree::`
			`:maxdepth: 1`
			`:caption: Advanced Features`

Doc: Add Docs about EAGLE speculative decoding (#3144) Co-authored-by: Chayenne <zhaochenyang@ucla.edu> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> 2025-01-26 19:49:13 -06:00			`backend/speculative_decoding.ipynb`
Docs: Fix layout with sub-section (#3710) 2025-02-19 15:44:30 -08:00			`backend/structured_outputs.ipynb`
[Docs]: Add function calling in index.rst (#3155) 2025-01-26 11:11:27 -08:00			`backend/function_calling.ipynb`
Reasoning parser (#4000) Co-authored-by: Lucas Pickup <lupickup@microsoft.com> 2025-03-04 13:16:36 +08:00			`backend/separate_reasoning.ipynb`
Docs: Fix layout with sub-section (#3710) 2025-02-19 15:44:30 -08:00			`backend/custom_chat_template.md`
			`backend/quantization.md`
Add support for ipynb (#1786) 2024-10-25 20:48:35 -07:00
			`.. toctree::`
			`:maxdepth: 1`
			`:caption: Frontend Tutorial`
Update ci workflows (#1804) 2024-10-26 04:32:36 -07:00
Docs: Implemented frontend docs (#3791) Co-authored-by: Chayenne <zhaochen20@outlook.com> 2025-02-27 00:30:05 +01:00			`frontend/frontend.ipynb`
[Doc] fix docs (#1949) 2024-11-07 18:20:41 -08:00			`frontend/choices_methods.md`
Adding Documentation for installation (#1300) Co-authored-by: zhaochen20 <zhaochenyang20@gmail.com> 2024-09-10 10:09:13 +08:00
Add Docs For SGLang Native Router (#2308) 2024-12-04 15:41:22 -08:00			`.. toctree::`
			`:maxdepth: 1`
			`:caption: SGLang Router`

			`router/router.md`

[Docs]: Fix Multi-User Port Allocation Conflicts (#3601) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: simveit <simp.veitner@gmail.com> 2025-02-19 19:15:44 +00:00			`.. toctree::`
Docs: Fix layout with sub-section (#3710) 2025-02-19 15:44:30 -08:00			`:maxdepth: 1`
			`:caption: References`
[Docs]: Fix Multi-User Port Allocation Conflicts (#3601) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: simveit <simp.veitner@gmail.com> 2025-02-19 19:15:44 +00:00
Docs: Fix layout with sub-section (#3710) 2025-02-19 15:44:30 -08:00			`references/general`
			`references/hardware`
			`references/advanced_deploy`
			`references/performance_tuning`