2025-02-24 00:31:08 -08:00
< div align = "center" id = "sglangtop" >
2024-10-11 14:27:42 +05:30
< img src = "https://raw.githubusercontent.com/sgl-project/sglang/main/assets/logo.png" alt = "logo" width = "400" margin = "10px" > < / img >
2024-03-10 18:51:47 -07:00
2024-07-28 22:24:27 +10:00
[](https://pypi.org/project/sglang)

[](https://github.com/sgl-project/sglang/tree/main/LICENSE)
[](https://github.com/sgl-project/sglang/issues)
[](https://github.com/sgl-project/sglang/issues)
2024-11-09 11:32:13 -08:00
[-006BFF)](https://gurubase.io/g/sglang)
2024-07-28 22:24:27 +10:00
2024-07-28 22:27:52 +10:00
< / div >
2024-03-10 18:51:47 -07:00
--------------------------------------------------------------------------------
2024-12-17 04:33:36 -08:00
| [**Blog** ](https://lmsys.org/blog/2024-07-25-sglang-llama3/ )
2025-01-13 18:40:48 +08:00
| [**Documentation** ](https://docs.sglang.ai/ )
| [**Join Slack** ](https://slack.sglang.ai/ )
| [**Join Bi-Weekly Development Meeting** ](https://meeting.sglang.ai/ )
2025-03-03 16:29:46 -08:00
| [**Roadmap** ](https://github.com/sgl-project/sglang/issues/4042 )
2024-12-17 04:33:36 -08:00
| [**Slides** ](https://github.com/sgl-project/sgl-learning-materials?tab=readme-ov-file#slides ) |
2024-01-08 04:37:50 +00:00
2024-10-06 15:14:29 -07:00
## News
2025-03-21 23:25:59 -07:00
- [2025/03] Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X ([AMD blog ](https://rocm.blogs.amd.com/artificial-intelligence/DeepSeekR1-Part2/README.html ))
- [2025/03] SGLang Joins PyTorch Ecosystem: Efficient LLM Serving Engine ([PyTorch blog ](https://pytorch.org/blog/sglang-joins-pytorch/ ))
- [2025/02] Unlock DeepSeek-R1 Inference Performance on AMD Instinct™ MI300X GPU ([AMD blog ](https://rocm.blogs.amd.com/artificial-intelligence/DeepSeekR1_Perf/README.html ))
2025-02-24 00:31:08 -08:00
- [2025/01] 🔥 SGLang provides day one support for DeepSeek V3/R1 models on NVIDIA and AMD GPUs with DeepSeek-specific optimizations. ([instructions ](https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3 ), [AMD blog ](https://www.amd.com/en/developer/resources/technical-articles/amd-instinct-gpus-power-deepseek-v3-revolutionizing-ai-development-with-sglang.html ), [10+ other companies ](https://x.com/lmsysorg/status/1887262321636221412 ))
2025-01-26 01:39:28 -08:00
- [2024/12] 🔥 v0.4 Release: Zero-Overhead Batch Scheduler, Cache-Aware Load Balancer, Faster Structured Outputs ([blog ](https://lmsys.org/blog/2024-12-04-sglang-v0-4/ )).
- [2024/09] v0.3 Release: 7x Faster DeepSeek MLA, 1.5x Faster torch.compile, Multi-Image/Video LLaVA-OneVision ([blog ](https://lmsys.org/blog/2024-09-04-sglang-v0-3/ )).
- [2024/07] v0.2 Release: Faster Llama3 Serving with SGLang Runtime (vs. TensorRT-LLM, vLLM) ([blog ](https://lmsys.org/blog/2024-07-25-sglang-llama3/ )).
2024-02-03 02:50:13 -08:00
2024-07-19 09:54:01 -07:00
< details >
< summary > More< / summary >
2025-01-26 01:39:28 -08:00
- [2024/10] The First SGLang Online Meetup ([slides ](https://github.com/sgl-project/sgl-learning-materials?tab=readme-ov-file#the-first-sglang-online-meetup )).
2024-10-19 12:58:55 -07:00
- [2024/02] SGLang enables **3x faster JSON decoding** with compressed finite state machine ([blog ](https://lmsys.org/blog/2024-02-05-compressed-fsm/ )).
2024-07-25 09:13:37 -07:00
- [2024/01] SGLang provides up to **5x faster inference** with RadixAttention ([blog ](https://lmsys.org/blog/2024-01-17-sglang/ )).
2024-07-19 09:54:01 -07:00
- [2024/01] SGLang powers the serving of the official **LLaVA v1.6** release demo ([usage ](https://github.com/haotian-liu/LLaVA?tab=readme-ov-file#demo )).
< / details >
2024-10-05 11:16:47 -07:00
## About
SGLang is a fast serving framework for large language models and vision language models.
It makes your interaction with models faster and more controllable by co-designing the backend runtime and frontend language.
The core features include:
2025-04-16 03:25:25 -07:00
- **Fast Backend Runtime**: Provides efficient serving with RadixAttention for prefix caching, zero-overhead CPU scheduler, continuous batching, token attention (paged attention), speculative decoding, tensor parallelism, chunked prefill, structured outputs, quantization (FP8/INT4/AWQ/GPTQ), and multi-lora batching.
2024-10-05 11:16:47 -07:00
- **Flexible Frontend Language**: Offers an intuitive interface for programming LLM applications, including chained generation calls, advanced prompting, control flow, multi-modal inputs, parallelism, and external interactions.
2024-11-21 17:24:25 -05:00
- **Extensive Model Support**: Supports a wide range of generative models (Llama, Gemma, Mistral, QWen, DeepSeek, LLaVA, etc.), embedding models (e5-mistral, gte, mcdse) and reward models (Skywork), with easy extensibility for integrating new models.
2024-10-05 11:16:47 -07:00
- **Active Community**: SGLang is open-source and backed by an active community with industry adoption.
2024-11-03 22:33:03 -08:00
## Getting Started
2025-01-13 18:40:48 +08:00
- [Install SGLang ](https://docs.sglang.ai/start/install.html )
2025-02-21 07:15:57 +08:00
- [Quick Start ](https://docs.sglang.ai/backend/send_request.html )
2025-01-13 18:40:48 +08:00
- [Backend Tutorial ](https://docs.sglang.ai/backend/openai_api_completions.html )
- [Frontend Tutorial ](https://docs.sglang.ai/frontend/frontend.html )
- [Contribution Guide ](https://docs.sglang.ai/references/contribution_guide.html )
2024-01-23 03:43:19 -08:00
2024-12-27 13:41:41 -08:00
## Benchmark and Performance
2025-01-06 15:36:23 -08:00
Learn more in the release blogs: [v0.2 blog ](https://lmsys.org/blog/2024-07-25-sglang-llama3/ ), [v0.3 blog ](https://lmsys.org/blog/2024-09-04-sglang-v0-3/ ), [v0.4 blog ](https://lmsys.org/blog/2024-12-04-sglang-v0-4/ )
2024-01-15 21:37:11 -08:00
2024-01-08 04:37:50 +00:00
## Roadmap
2025-03-03 17:09:18 -08:00
[Development Roadmap (2025 H1) ](https://github.com/sgl-project/sglang/issues/4042 )
2024-01-08 04:37:50 +00:00
2024-11-24 08:25:56 -08:00
## Adoption and Sponsorship
2025-02-24 00:31:08 -08:00
The project has been deployed to large-scale production, generating trillions of tokens every day.
2025-04-12 22:55:25 -07:00
It is supported by the following institutions: AMD, Atlas Cloud, Baseten, Cursor, DataCrunch, Etched, Hyperbolic, Iflytek, Jam & Tea Studios, LinkedIn, LMSYS, Meituan, Nebius, Novita AI, NVIDIA, Oracle, RunPod, Stanford, UC Berkeley, UCLA, xAI, and 01.AI.
2025-02-24 00:31:08 -08:00
< img src = "https://raw.githubusercontent.com/sgl-project/sgl-learning-materials/main/slides/adoption.png" alt = "logo" width = "800" margin = "10px" > < / img >
2024-11-24 08:25:56 -08:00
2025-02-01 19:47:44 +08:00
## Contact Us
2025-02-01 21:19:15 +08:00
For enterprises interested in adopting or deploying SGLang at scale, including technical consulting, sponsorship opportunities, or partnership inquiries, please contact us at contact@sglang .ai.
2025-02-01 19:47:44 +08:00
2025-04-25 15:54:39 -07:00
## Acknowledgment
We learned the design and reused code from the following projects: [Guidance ](https://github.com/guidance-ai/guidance ), [vLLM ](https://github.com/vllm-project/vllm ), [LightLLM ](https://github.com/ModelTC/lightllm ), [FlashInfer ](https://github.com/flashinfer-ai/flashinfer ), [Outlines ](https://github.com/outlines-dev/outlines ), and [LMQL ](https://github.com/eth-sri/lmql ).