sglang

Author	SHA1	Message	Date
Lianmin Zheng	e8e18dcdcc	Revert "fix some typos" (#6244 )	2025-05-12 12:53:26 -07:00
applesaucethebun	d738ab52f8	fix some typos (#6209 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-13 01:42:38 +08:00
Cheng Wan	25c83fff6a	Performing Vocabulary Parallelism for LM Head across Attention TP Groups (#5558 ) Co-authored-by: liusy58 <liusy58@linux.alibaba.com>	2025-05-11 23:36:29 -07:00
Ximingwang-09	921e4a8185	[Docs]Delete duplicate content (#6146 ) Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>	2025-05-10 15:02:15 -07:00
Zhu Chen	fa7d7fd9e5	[Feature] Add FlashAttention3 as a backend for VisionAttention (#5764 ) Co-authored-by: othame <chenzhu_912@zju.edu.cn> Co-authored-by: Mick <mickjagger19@icloud.com> Co-authored-by: Yi Zhang <1109276519@qq.com>	2025-05-08 10:01:19 -07:00
Baizhou Zhang	8f508cc77f	Update doc for MLA attention backends (#6034 )	2025-05-07 18:51:05 -07:00
Baizhou Zhang	fee37d9e8d	[Doc]Fix description for dp_size argument (#6063 )	2025-05-08 00:04:22 +08:00
Wenxuan Tan	22da3d978f	Fix "Avoid computing lse in Ragged Prefill when there's no prefix match" (#5555 )	2025-05-05 10:32:17 -07:00
Qiaolin Yu	8c0cfca87d	Feat: support cuda graph for LoRA (#4115 ) Co-authored-by: Beichen Ma <mabeichen12@gmail.com>	2025-04-28 23:30:44 -07:00
Lianmin Zheng	849c83a0c0	[CI] test chunked prefill more (#5798 )	2025-04-28 10:57:17 -07:00
Baizhou Zhang	f48b007c1d	[Doc] Recover history of server_arguments.md (#5851 )	2025-04-28 10:48:21 -07:00
Michael Yao	966eb90865	[Docs] Replace lists with tables for cleanup and readability in server_arguments (#5276 )	2025-04-28 00:36:10 -07:00
Trevor Morris	84810da4ae	Add Cutlass MLA attention backend (#5390 )	2025-04-27 20:58:53 -07:00
Lianmin Zheng	155890e4d1	[Minor] fix documentations (#5756 )	2025-04-26 17:48:43 -07:00
Michael Yao	92bb64bc86	[Doc] Fix a 404 link to llama-405b (#5615 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-04-21 20:39:37 -07:00
Yineng Zhang	a6f892e5d0	Revert "Avoid computing lse in Ragged Prefill when there's no prefix.… (#5544 )	2025-04-18 16:50:21 -07:00
Wenxuan Tan	bfa3922451	Avoid computing lse in Ragged Prefill when there's no prefix. (#5476 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-04-18 01:13:57 -07:00
Baizhou Zhang	6fb29ffd9e	Deprecate enable-flashinfer-mla and enable-flashmla (#5480 )	2025-04-17 01:43:33 -07:00
Baizhou Zhang	4fb05583ef	Deprecate disable-mla (#5481 )	2025-04-17 01:43:14 -07:00
Baizhou Zhang	a42736bbb8	Support MHA with chunked prefix cache for DeepSeek chunked prefill (#5113 )	2025-04-15 22:01:22 -07:00
Mick	34ef6c8135	[VLM] Adopt fast image processor by default (#5065 )	2025-04-11 21:46:58 -07:00
Baizhou Zhang	efbae697b3	[Revision] Replace enable_flashinfer_mla argument with attention_backend (#5052 )	2025-04-05 01:23:02 -07:00
Lianmin Zheng	74885a848b	Revert "Replace enable_flashinfer_mla argument with attention_backend" (#5048 )	2025-04-03 13:30:56 -07:00
Baizhou Zhang	e8999b13b7	Replace enable_flashinfer_mla argument with attention_backend (#5005 )	2025-04-03 02:53:58 -07:00
Jinyan Chen	23c764b18a	[Feature] Support DeepEP Low Latency (#4767 ) Co-authored-by: sleepcoo <sleepcoo@gmail.com> Co-authored-by: laixinn <xielx@shanghaitech.edu.cn> Co-authored-by: ch-wan <cwan39@gatech.edu>	2025-04-01 09:23:25 -07:00
tarinkk	7f19e083c1	Support (1 <= dp < tp) in the dp attention in DeepEP (#4770 ) Co-authored-by: Cheng Wan <cwan39@gatech.edu>	2025-03-27 17:09:35 -07:00
Jiří Suchomel	f60f293195	[k8s] Clarified the usage of shared memory. (#4341 )	2025-03-27 08:53:19 -07:00
Jinyan Chen	f44db16c8e	[Feature] Integrate DeepEP into SGLang (#4232 ) Co-authored-by: Cheng Wan <cwan39@gatech.edu> Co-authored-by: Xuting Zhou <xutingz@nvidia.com>	2025-03-19 08:16:31 -07:00
Chayenne	e1a5e7e47d	docs: hot fix torch compile cache (#4442 )	2025-03-14 19:05:59 -07:00
Jun Liu	14344caa38	[docs] Update outdated description about `torch.compile` (#3844 )	2025-03-12 22:09:38 -07:00
Peter Pan	016033188c	docs: add parameter --log-requests-level (#4335 )	2025-03-12 21:19:37 -07:00
Chayenne	ebddb65aed	Docs: add torch compile cache (#4151 ) Co-authored-by: ybyang <ybyang7@iflytek.com>	2025-03-06 14:27:09 -08:00
samzong	d2d0d061d9	fix cross-reference error and spelling mistakes (#4101 ) Signed-off-by: samzong <samzong.lu@gmail.com>	2025-03-05 16:39:02 -08:00
Qiaolin Yu	357671e216	Add examples for server token-in-token-out (#4103 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-03-05 13:16:31 -08:00
Baizhou Zhang	fc91d08a8f	[Revision] Add fast decode plan for flashinfer mla (#4012 )	2025-03-05 11:20:41 -08:00
Mick	583d6af71b	example: add vlm to token in & out example (#3941 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-03-04 22:18:26 -08:00
Chayenne	146ac8df07	Add examples in sampling parameters (#4039 )	2025-03-03 13:04:32 -08:00
Lianmin Zheng	935cda944b	Misc clean up; Remove the support of jump forward (#4032 )	2025-03-03 07:02:14 -08:00
Lianmin Zheng	ac2387279e	Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>	2025-03-03 00:12:04 -08:00
Chayenne	728e175fc4	Add examples to token-in-token-out for LLM (#4010 )	2025-03-02 21:03:49 -08:00
Lianmin Zheng	9e1014cf99	Revert "Add fast decode plan for flashinfer mla" (#4008 )	2025-03-02 19:29:10 -08:00
Baizhou Zhang	fa56106731	Add fast decode plan for flashinfer mla (#3987 )	2025-03-02 19:16:37 -08:00
Zhousx	7fbab730bd	[feat] add small vocab table for eagle's draft model[1]. (#3822 ) Co-authored-by: Achazwl <323163497@qq.com> Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-03-02 18:58:45 -08:00
Baizhou Zhang	90a4b7d98a	[Feature]Support ragged prefill in flashinfer mla backend (#3967 ) Co-authored-by: Yineng Zhang <me@zhyncs.com> Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>	2025-02-28 18:13:56 -08:00
Baizhou Zhang	3e02526b1f	[Doc] Add experimental tag for flashinfer mla (#3925 )	2025-02-27 01:55:36 -08:00
Baizhou Zhang	71ed01833d	[doc] Update document for flashinfer mla (#3907 )	2025-02-26 20:40:45 -08:00
simveit	44a2c4bd56	Docs: improve link to docs (#3860 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-02-26 10:29:25 -08:00
Yuanheng Zhao	3758d209a0	[Doc] Fix typo in server-argument description (#3641 )	2025-02-24 16:57:13 -08:00
Baizhou Zhang	ac05310098	[Docs] Modify ep related server args and remove cublas part of deepseek (#3732 )	2025-02-21 03:37:56 +08:00
Mick	7711ac6ed0	doc: emphasize and notify the usage of chat_template (#3589 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-02-15 00:10:32 -08:00

1 2

58 Commits