sglang

Author	SHA1	Message	Date
Lianmin Zheng	2e8e7e353b	Improve docs and developer guide (#9044 )	2025-08-10 21:05:18 -07:00
Lianmin Zheng	2449a0afe2	Refactor the docs (#9031 )	2025-08-10 19:49:45 -07:00
Lifu Huang	f8a173bb50	Improve LoRA Perf by Deprecating FlashInfer and Eliminating Redundant Tensor Ops (#8940 )	2025-08-10 01:04:45 -07:00
Binyao Jiang	f29aba8c6e	Support glm4.1v and glm4.5v (#8798 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: zRzRzRzRzRzRzR <2448370773@qq.com> Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com> Co-authored-by: Chang Su <csu272@usc.edu>	2025-08-09 00:59:13 -07:00
Lianmin Zheng	706bd69cc5	Clean up server_args.py to have a dedicated function for model specific adjustments (#8983 )	2025-08-08 19:56:50 -07:00
Yineng Zhang	9020f7fc32	chore: bump v0.5.0rc0 (#8959 )	2025-08-08 09:16:18 -07:00
Wenbo Yang	1132547496	Add ernie4.py for ERNIE-4.5 (#7657 )	2025-08-08 00:55:48 -07:00
Xinyuan Tong	3fa3c6cd6a	Enables force reasoning based on chat template for Qwen3-Thinking (#8369 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: Chang Su <csu272@usc.edu>	2025-08-06 20:02:47 -07:00
Lifu Huang	6210e2c4f0	Support GPU pinning for LoRA (#8697 )	2025-08-06 19:39:45 -07:00
HouseWest	ca47e24f5d	[Feature] improve TBO: two chunk overlap (#8144 )	2025-08-05 21:11:01 -07:00
Praneth Paruchuri	d26ca84f39	Support bailing moe (#8680 )	2025-08-05 20:40:34 -07:00
Yineng Zhang	8cd344586e	chore: bump v0.4.10.post2 (#8727 )	2025-08-03 03:43:29 -07:00
Guanhua Wang	f7b2853ff8	[feat] support minimum token load balance in dp attention (#7379 )	2025-08-03 00:46:47 -07:00
Lifu Huang	8675bdf246	Support limiting max loaded loras in CPU. (#8650 )	2025-08-03 00:02:23 -07:00
Nicolas Castet	82e6c3a65a	Add support for NCCL symmetric memory for TP allreduces (#8238 )	2025-08-01 23:30:55 +00:00
Zac	b17c5b0118	fix arg typo for --disaggregation-transfer-backend (#8664 )	2025-08-01 10:00:47 -07:00
Cheng Wan	6c88f6c8d9	[5/N] MoE Refactor: Update MoE parallelism arguments (#8658 )	2025-08-01 01:20:03 -07:00
Ke Bao	33f0de337d	chore: bump v0.4.10.post1 (#8652 )	2025-08-01 12:07:30 +08:00
Faraz	4b04998d38	TRTLLM Gen MLA Decode Kernel Integration (same as #7938 ) (#8632 ) Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>	2025-07-31 16:03:40 -07:00
Yineng Zhang	023288645b	chore: bump v0.4.10 (#8608 )	2025-07-31 20:50:17 +08:00
Chang Su	51c38163c1	model: support Step3V (#8583 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: nnnobody-code <nnnobody@foxmail.com> Co-authored-by: ispobock <ispobaoke@gmail.com> Co-authored-by: Qiaolin-Yu <qy254@cornell.edu> Co-authored-by: Qiaolin-Yu <liin1211@outlook.com> Co-authored-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>	2025-07-31 02:41:00 -07:00
Adarsh Shirawalmath	ec5f944271	[Model] Add support for Arcee Foundational Model (#8154 )	2025-07-30 10:45:25 -07:00
Rui Chen	a730ce8162	[feature] [sgl-router] Add a dp-aware routing strategy (#6869 )	2025-07-30 05:58:48 -07:00
Yineng Zhang	6478831be9	chore: bump v0.4.9.post6 (#8517 )	2025-07-29 02:30:07 -07:00
Kaixi Hou	134fa43e19	[NVIDIA] Change to use `num_local_experts` (#8453 )	2025-07-28 10:38:19 -07:00
Yineng Zhang	45bc170b36	chore: bump v0.4.9.post5 (#8458 )	2025-07-28 02:11:06 -07:00
Qiaolin Yu	484d0e021d	doc: add bench_one_batch_server in the benchmark doc (#8441 )	2025-07-27 23:07:54 -07:00
Qiaolin Yu	2810338401	[feat] Support different attention backends for prefill and decode (#6338 ) Co-authored-by: tianqilin.99 <tianqilin.99@bytedance.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-07-28 11:42:29 +08:00
Kevin Xiang Li	44d600cd67	Support precomputed_embeddings for Llama 4 (#8156 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Xiang (Kevin) Li <lik@nvidia.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-07-27 01:14:49 -07:00
Yineng Zhang	2272c2a5b5	chore: bump v0.4.9.post4 (#8305 )	2025-07-25 17:12:47 -07:00
Chang Su	d8ee15643b	[Feat] Add reasoning parser for Qwen/Qwen3-235B-A22B-Thinking-2507 (#8363 )	2025-07-25 14:59:42 -07:00
Xiaoyu Zhang	9045cc1eb8	[torch.compile bug] avoid biased_grouped_topk_impl func repeatedly triggering `torch.compile` in forward pass (#8353 )	2025-07-25 21:17:47 +08:00
Zaili Wang	15d2759174	[CPU] Add tutorial docs for SGL on CPU (#8000 )	2025-07-25 00:03:16 -07:00
Yineng Zhang	01c000043c	chore: bump v0.4.9.post3 (#8265 )	2025-07-22 15:55:48 -07:00
Xinyuan Tong	8430bfe3e9	[Refactor] simplify multimodal data processing (#8107 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-07-20 21:43:09 -07:00
Praneth Paruchuri	83c104b188	Feat: Support for Persimmon Model (#7983 )	2025-07-19 23:07:47 -07:00
Lifu Huang	4e3defe5a7	Support start up LoRA server without initial adapters (#8019 )	2025-07-19 15:38:09 -07:00
Lianmin Zheng	bb0e8a32b5	Clean up server args (#8161 )	2025-07-19 11:32:52 -07:00
Binyao Jiang	b7e951a6db	Feat: Support audio in Phi4-mm model (#8048 )	2025-07-18 21:03:53 -07:00
Lianmin Zheng	9c7a46180c	[Doc] Steps to add a new attention backend (#8155 )	2025-07-18 16:38:26 -07:00
Minglei Zhu	8a32355704	Feat: Support Granite 3.0 MoE in SGLang (#7959 )	2025-07-17 20:56:03 -07:00
Praneth Paruchuri	cb736df854	Support for Phi-1.5 & Phi-2 models (#7862 )	2025-07-13 18:43:40 -07:00
Lifu Huang	e2ed9d049a	Refactor dynamic LoRA update to fix incorrect handling of variant weight shapes (#7844 )	2025-07-13 18:36:01 -07:00
Yineng Zhang	22bd857cb5	docs: update README (#7985 )	2025-07-12 13:31:11 -07:00
Yineng Zhang	eb118d88c4	chore: bump v0.4.9.post2 (#7963 )	2025-07-11 21:11:20 -07:00
ronnie_zheng	86044712c6	[feature] kv transfer support of ascend npu (#7795 ) Co-authored-by: liupeng <liupeng374@huawei.com>	2025-07-11 00:07:51 -07:00
Atream	615553079d	Support Kimi K2 (#7940 )	2025-07-11 00:02:21 -07:00
Binyao Jiang	2d54d4bb64	Feat: Support Phi-3.5-MoE in SGLang (#7907 )	2025-07-09 23:51:33 -07:00
Yineng Zhang	066f4ec91f	chore: bump v0.4.9.post1 (#7882 )	2025-07-09 00:28:17 -07:00
Yikai Zhang	0870232195	Update native_api doc to match the change in the `get_model_info` endpoint (#7660 ) Co-authored-by: Lifu Huang <lifu.hlf@gmail.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>	2025-07-08 21:05:58 -07:00

1 2 3 4 5 ...

600 Commits