sglang

Author	SHA1	Message	Date
Brayden Zhong	b149b39353	[CI] Remove unused imports with Ruff to pre-commit config, only to benchmarks/docs/examples folder (#3969 )	2025-03-27 19:45:02 -07:00
tarinkk	7f19e083c1	Support (1 <= dp < tp) in the dp attention in DeepEP (#4770 ) Co-authored-by: Cheng Wan <cwan39@gatech.edu>	2025-03-27 17:09:35 -07:00
Ke Bao	b39532587b	Update doc for DeepSeek-V3-0324 (#4825 )	2025-03-27 13:30:40 -07:00
Jiří Suchomel	f60f293195	[k8s] Clarified the usage of shared memory. (#4341 )	2025-03-27 08:53:19 -07:00
Pan Lyu	c913ed4046	support clip embedding model (#4506 )	2025-03-27 00:18:15 -07:00
Didier Durand	44f47d3ee1	Update supported_models.md: adding open-r1 Olympic Code 32B by HuggingFace (#4628 )	2025-03-27 00:16:16 -07:00
Yineng Zhang	1099f6c974	bump v0.4.4.post2 (#4669 )	2025-03-26 19:58:00 -07:00
fzyzcjy	15ddd84322	Add retry for flaky tests in CI (#4755 )	2025-03-25 16:53:12 -07:00
yuhsaun-t	199bb01d00	Add endpoints to dump selected expert ids (#4435 ) Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>	2025-03-24 21:34:19 -07:00
Mick	1e86457c90	model: Minicpmo (#3023 )	2025-03-24 20:08:40 -07:00
Ximingwang-09	22c3702e1e	[Model] Support Qwen2ForSequenceClassification (#4609 ) Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>	2025-03-24 19:13:44 -07:00
BroadbentJim	8796cebb2c	fix typo SGLang supports three grammar backends (#4679 )	2025-03-22 14:33:48 -07:00
Adarsh Shirawalmath	fb8886037c	[Docs] Update docs for gemma3 and VLM chat templates (#4674 )	2025-03-22 08:02:19 -07:00
mlmz	f6ab4ca6bc	fix: fix ipython running error for Engine due to outlines nest_asyncio (#4582 ) Co-authored-by: shuaills <shishuaiuoe@gmail.com>	2025-03-21 19:11:15 -07:00
Michael Yao	c6ec70290f	[docs] Add links and fix grammars in deploy_on_k8s.md (#4641 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-03-20 22:55:23 -07:00
Ke Bao	bfb03c6182	Update doc for MTP and DP attention (#4622 )	2025-03-20 11:31:48 -07:00
Jinyan Chen	f44db16c8e	[Feature] Integrate DeepEP into SGLang (#4232 ) Co-authored-by: Cheng Wan <cwan39@gatech.edu> Co-authored-by: Xuting Zhou <xutingz@nvidia.com>	2025-03-19 08:16:31 -07:00
James Liu	9e0186f352	[Feature] Support EAGLE 3 (#4247 )	2025-03-18 07:35:23 -07:00
Albert	2d0045125f	Fix the incorrect args in benchmark_and_profiling.md (#4542 ) Signed-off-by: Tianyu Zhou <albert.zty@antgroup.com>	2025-03-18 00:07:06 -07:00
Lianmin Zheng	c38ca4fc8e	Update readme (#4517 )	2025-03-17 08:22:42 -07:00
HandH1998	f2ab37e500	[Doc] add doc for quantization w8a8_fp8 or w8a8_int8 (#4495 )	2025-03-17 02:25:00 -07:00
Xihuai Wang	927ca935a7	Constraint Decoding: Tool call with text (#4067 )	2025-03-17 01:06:46 -07:00
Wenbo Yang	75b656488a	Support serving DeepSeek-R1-Channel-INT8 with 32 L40S. (#4418 )	2025-03-17 00:03:43 -07:00
萝卜菜	d6d21640d3	[Feature] Support Deepseek-VL2 (#2798 ) Co-authored-by: Edenzzzz <wtan45@wisc.edu> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: Yi Zhang <1109276519@qq.com>	2025-03-16 23:07:59 -07:00
mlmz	452db50808	Constraint Decoding: Set xgrammar as the default grammar backend (#4386 )	2025-03-16 18:53:43 -07:00
Mick	9d02bb3e2a	Urgent model support: support gemma-3-it (#4424 )	2025-03-16 17:37:32 -07:00
Wang Ran (汪然)	22c96f78a6	typos: Update sampling_params.md (#4391 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2025-03-15 16:40:18 -07:00
江家瑋	26c372c13c	docs: Add Llama 3.3 to supported models (#4453 ) Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>	2025-03-15 16:33:43 -07:00
Chayenne	e1a5e7e47d	docs: hot fix torch compile cache (#4442 )	2025-03-14 19:05:59 -07:00
Zhan Lu	660305c38a	[Doc] fix wrong flag in deepseek documentation (#4427 )	2025-03-14 11:30:55 -07:00
Yineng Zhang	ba80c102f9	bump v0.4.4.post1 (#4402 )	2025-03-13 17:53:46 -07:00
Yineng Zhang	6aaeb84872	chore: bump v0.4.4 (#4041 )	2025-03-13 02:49:58 -07:00
Lianmin Zheng	45de89719c	Revert "[XPU][CPU] Enable the native path of DeepSeek" (#4367 )	2025-03-12 23:45:52 -07:00
Meng, Hengyu	71046fcd71	[XPU][CPU] Enable the native path of DeepSeek (#4086 ) Co-authored-by: Zhang, Liangang <liangang.zhang@intel.com>	2025-03-12 22:26:29 -07:00
yang_zcybb	ad46550d25	[Doc] Fix typo in backend/sampling_params (#3835 ) Co-authored-by: yangzhice.124 <yangzhice.124@bytedance.com>	2025-03-12 22:12:14 -07:00
Jun Liu	14344caa38	[docs] Update outdated description about `torch.compile` (#3844 )	2025-03-12 22:09:38 -07:00
William	0a59a4657a	Fix the doc of FR-Spec (#4295 )	2025-03-12 21:22:50 -07:00
Peter Pan	016033188c	docs: add parameter --log-requests-level (#4335 )	2025-03-12 21:19:37 -07:00
shizhediao	2c3656f276	[Fix Doc.] Enable internal forwarding when starting the router (#4355 )	2025-03-12 15:53:26 -07:00
Mick	01090e8ac3	model: Support Janus-pro (#3203 )	2025-03-12 11:02:11 -07:00
Michael Yao	8f1f614ee2	[Docs] Clean up benchmark_and_profiling.md (#4297 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-03-11 21:48:21 -07:00
Yineng Zhang	1cf63485c1	upgrade flashinfer 0.2.3 (#4317 ) Co-authored-by: qingquansong <qsong@linkedin.com>	2025-03-11 15:37:17 -07:00
Yineng Zhang	00f42707ea	update doc (#4299 )	2025-03-11 01:14:16 -07:00
Ke Bao	3a08f54638	Update MTP doc (#4290 )	2025-03-11 00:46:55 -07:00
Baizhou Zhang	9fb48f951f	Support nextn for flashinfer mla attention backend (#4218 )	2025-03-09 00:01:54 -08:00
Stefan He	dceb256f1b	[docs] Unhide production metrics page (#4193 )	2025-03-08 23:41:40 -08:00
Peter Pan	0e90ae628a	[docker] Distributed Serving with k8s Statefulset ( good example for DeepSeek-R1) (#3631 ) Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> Co-authored-by: Kebe <kebe.liu@daocloud.io>	2025-03-08 23:41:20 -08:00
Xihuai Wang	6eec3cdce6	docs(reasoning content): 📝 deepseek-r1 parser support qwq (#4124 )	2025-03-09 04:14:50 +00:00
Michael Yao	c827c671f7	[Docs] Improve bullets appearance and grammar (#4174 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-03-07 03:16:25 -08:00
Yineng Zhang	b55a621ffb	fix int8 doc link (#4179 )	2025-03-07 02:49:19 -08:00

1 2 3 4 5 ...

391 Commits