sglang

Author	SHA1	Message	Date
Lianmin Zheng	74885a848b	Revert "Replace enable_flashinfer_mla argument with attention_backend" (#5048 )	2025-04-03 13:30:56 -07:00
Baizhou Zhang	e8999b13b7	Replace enable_flashinfer_mla argument with attention_backend (#5005 )	2025-04-03 02:53:58 -07:00
fzyzcjy	736502d4fd	Tiny fix doc error (#4795 )	2025-03-29 08:22:17 -07:00
Ke Bao	b39532587b	Update doc for DeepSeek-V3-0324 (#4825 )	2025-03-27 13:30:40 -07:00
Pan Lyu	c913ed4046	support clip embedding model (#4506 )	2025-03-27 00:18:15 -07:00
Didier Durand	44f47d3ee1	Update supported_models.md: adding open-r1 Olympic Code 32B by HuggingFace (#4628 )	2025-03-27 00:16:16 -07:00
Mick	1e86457c90	model: Minicpmo (#3023 )	2025-03-24 20:08:40 -07:00
Ximingwang-09	22c3702e1e	[Model] Support Qwen2ForSequenceClassification (#4609 ) Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>	2025-03-24 19:13:44 -07:00
Adarsh Shirawalmath	fb8886037c	[Docs] Update docs for gemma3 and VLM chat templates (#4674 )	2025-03-22 08:02:19 -07:00
Michael Yao	c6ec70290f	[docs] Add links and fix grammars in deploy_on_k8s.md (#4641 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-03-20 22:55:23 -07:00
Ke Bao	bfb03c6182	Update doc for MTP and DP attention (#4622 )	2025-03-20 11:31:48 -07:00
Albert	2d0045125f	Fix the incorrect args in benchmark_and_profiling.md (#4542 ) Signed-off-by: Tianyu Zhou <albert.zty@antgroup.com>	2025-03-18 00:07:06 -07:00
Wenbo Yang	75b656488a	Support serving DeepSeek-R1-Channel-INT8 with 32 L40S. (#4418 )	2025-03-17 00:03:43 -07:00
萝卜菜	d6d21640d3	[Feature] Support Deepseek-VL2 (#2798 ) Co-authored-by: Edenzzzz <wtan45@wisc.edu> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: Yi Zhang <1109276519@qq.com>	2025-03-16 23:07:59 -07:00
Mick	9d02bb3e2a	Urgent model support: support gemma-3-it (#4424 )	2025-03-16 17:37:32 -07:00
江家瑋	26c372c13c	docs: Add Llama 3.3 to supported models (#4453 ) Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>	2025-03-15 16:33:43 -07:00
Zhan Lu	660305c38a	[Doc] fix wrong flag in deepseek documentation (#4427 )	2025-03-14 11:30:55 -07:00
Mick	01090e8ac3	model: Support Janus-pro (#3203 )	2025-03-12 11:02:11 -07:00
Michael Yao	8f1f614ee2	[Docs] Clean up benchmark_and_profiling.md (#4297 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-03-11 21:48:21 -07:00
Ke Bao	3a08f54638	Update MTP doc (#4290 )	2025-03-11 00:46:55 -07:00
Baizhou Zhang	9fb48f951f	Support nextn for flashinfer mla attention backend (#4218 )	2025-03-09 00:01:54 -08:00
Stefan He	dceb256f1b	[docs] Unhide production metrics page (#4193 )	2025-03-08 23:41:40 -08:00
Michael Yao	c827c671f7	[Docs] Improve bullets appearance and grammar (#4174 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-03-07 03:16:25 -08:00
Yineng Zhang	b55a621ffb	fix int8 doc link (#4179 )	2025-03-07 02:49:19 -08:00
lukec	ffa1b3e318	Add an example of using deepseekv3 int8 sglang. (#4177 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-03-07 01:56:09 -08:00
Pan Lyu	361971b859	Add Support for Qwen2-VL Multi-modal Embedding Models (#3694 )	2025-03-06 16:46:20 -08:00
Chayenne	ebddb65aed	Docs: add torch compile cache (#4151 ) Co-authored-by: ybyang <ybyang7@iflytek.com>	2025-03-06 14:27:09 -08:00
Adarsh Shirawalmath	19fd57bcd7	[docs] fix HF reference script command (#4148 )	2025-03-06 13:21:54 -08:00
samzong	d2d0d061d9	fix cross-reference error and spelling mistakes (#4101 ) Signed-off-by: samzong <samzong.lu@gmail.com>	2025-03-05 16:39:02 -08:00
Yineng Zhang	0aaccbbfec	revert deepseek docs (#4109 )	2025-03-05 13:23:11 -08:00
Chayenne	e70fa279bc	Docs: reorganize dpsk docs (#4108 )	2025-03-05 13:01:03 -08:00
Tommy Yang	abe74b7b59	Docs: Add DeepSeek optimization ablations documentation (#4107 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-03-05 12:25:51 -08:00
Baizhou Zhang	fc91d08a8f	[Revision] Add fast decode plan for flashinfer mla (#4012 )	2025-03-05 11:20:41 -08:00
Xihuai Wang	95575aa76a	Reasoning parser (#4000 ) Co-authored-by: Lucas Pickup <lupickup@microsoft.com>	2025-03-03 21:16:36 -08:00
Chayenne	146ac8df07	Add examples in sampling parameters (#4039 )	2025-03-03 13:04:32 -08:00
Lianmin Zheng	935cda944b	Misc clean up; Remove the support of jump forward (#4032 )	2025-03-03 07:02:14 -08:00
Yudi Xue	a7000a7650	Update metrics documentation (#3264 )	2025-03-03 05:03:58 -08:00
Lianmin Zheng	9e1014cf99	Revert "Add fast decode plan for flashinfer mla" (#4008 )	2025-03-02 19:29:10 -08:00
Baizhou Zhang	fa56106731	Add fast decode plan for flashinfer mla (#3987 )	2025-03-02 19:16:37 -08:00
Yineng Zhang	5d86016855	revert "Docs: Reorngaize dpsk links #3900 " (#3933 )	2025-02-27 08:57:13 -08:00
Baizhou Zhang	3e02526b1f	[Doc] Add experimental tag for flashinfer mla (#3925 )	2025-02-27 01:55:36 -08:00
Stefan He	d8a98a2cad	[Docs] Improve DPSK docs in dark mode (#3914 )	2025-02-27 00:13:04 -08:00
Baizhou Zhang	71ed01833d	[doc] Update document for flashinfer mla (#3907 )	2025-02-26 20:40:45 -08:00
Chayenne	7c1692aa90	Docs: Reorngaize dpsk links (#3900 )	2025-02-26 15:16:31 -08:00
Chayenne	8f019c7d1a	Docs: Move dpsk docs forward a step (#3894 )	2025-02-26 11:43:20 -08:00
Shenggui Li	3dc9ff3ce8	[doc] fixed dpsk quant faq (#3865 )	2025-02-25 19:40:47 -08:00
Shenggui Li	06427dfab1	[doc] added quantization doc for dpsk (#3843 )	2025-02-25 09:43:28 -08:00
Shenggui Li	c0bb9eb3b3	[improve] made timeout configurable (#3803 )	2025-02-25 00:26:08 -08:00
Baizhou Zhang	4d2a88bdff	[Docs]Add instruction for manually stopping nsys profiler (#3795 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-02-23 13:21:48 -08:00
Mick	45205d88a0	bench: Add MMMU benchmark for vLM (#3562 )	2025-02-22 08:10:59 -08:00

1 2 3

135 Commits