sglang

Author	SHA1	Message	Date
Baizhou Zhang	a42736bbb8	Support MHA with chunked prefix cache for DeepSeek chunked prefill (#5113 )	2025-04-15 22:01:22 -07:00
mRSun15	3efc8e2d2a	add attention backend supporting matrix in the doc (#5211 ) Co-authored-by: Stefan He <hebiaobuaa@gmail.com>	2025-04-15 17:16:34 -07:00
thyecust	2074a2e6b6	Fix: docs/backend/structured_outputs.ipynb (#4884 )	2025-04-12 02:18:55 -07:00
Mick	34ef6c8135	[VLM] Adopt fast image processor by default (#5065 )	2025-04-11 21:46:58 -07:00
Adarsh Shirawalmath	4aa6bab0b0	[Docs] Supported Model Docs - Major restructuring (#5290 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-04-11 09:17:47 -07:00
Michael Yao	fc14cca088	Fix a 404 link in send_request.ipynb (#5280 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-04-11 01:38:45 -07:00
mlmz	4d2e305149	doc: nested loop code for offline engine (#5244 )	2025-04-11 01:36:30 -07:00
simveit	f8194b267c	Small improvement of native api docs (#5139 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-04-08 12:09:26 -07:00
mlmz	7c5658c189	feat: disable grammar restrictions within reasoning sections (#4984 ) Co-authored-by: tianhaoyu <thy@mail.ecust.edu.cn> Co-authored-by: DarkSharpness <2040703891@qq.com>	2025-04-07 21:46:47 -07:00
Baizhou Zhang	efbae697b3	[Revision] Replace enable_flashinfer_mla argument with attention_backend (#5052 )	2025-04-05 01:23:02 -07:00
simveit	98f768d194	update eagle-3 docs (#4796 ) Co-authored-by: Yifan Zhang <zhangyif21@mails.tsinghua.edu.cn>	2025-04-03 15:24:41 -07:00
Lianmin Zheng	74885a848b	Revert "Replace enable_flashinfer_mla argument with attention_backend" (#5048 )	2025-04-03 13:30:56 -07:00
Baizhou Zhang	e8999b13b7	Replace enable_flashinfer_mla argument with attention_backend (#5005 )	2025-04-03 02:53:58 -07:00
Jinyan Chen	23c764b18a	[Feature] Support DeepEP Low Latency (#4767 ) Co-authored-by: sleepcoo <sleepcoo@gmail.com> Co-authored-by: laixinn <xielx@shanghaitech.edu.cn> Co-authored-by: ch-wan <cwan39@gatech.edu>	2025-04-01 09:23:25 -07:00
Ke Bao	aa08aeacf4	update torch compile doc (#4874 )	2025-03-28 19:49:30 -07:00
Brayden Zhong	b149b39353	[CI] Remove unused imports with Ruff to pre-commit config, only to benchmarks/docs/examples folder (#3969 )	2025-03-27 19:45:02 -07:00
tarinkk	7f19e083c1	Support (1 <= dp < tp) in the dp attention in DeepEP (#4770 ) Co-authored-by: Cheng Wan <cwan39@gatech.edu>	2025-03-27 17:09:35 -07:00
Jiří Suchomel	f60f293195	[k8s] Clarified the usage of shared memory. (#4341 )	2025-03-27 08:53:19 -07:00
yuhsaun-t	199bb01d00	Add endpoints to dump selected expert ids (#4435 ) Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>	2025-03-24 21:34:19 -07:00
BroadbentJim	8796cebb2c	fix typo SGLang supports three grammar backends (#4679 )	2025-03-22 14:33:48 -07:00
Adarsh Shirawalmath	fb8886037c	[Docs] Update docs for gemma3 and VLM chat templates (#4674 )	2025-03-22 08:02:19 -07:00
mlmz	f6ab4ca6bc	fix: fix ipython running error for Engine due to outlines nest_asyncio (#4582 ) Co-authored-by: shuaills <shishuaiuoe@gmail.com>	2025-03-21 19:11:15 -07:00
Jinyan Chen	f44db16c8e	[Feature] Integrate DeepEP into SGLang (#4232 ) Co-authored-by: Cheng Wan <cwan39@gatech.edu> Co-authored-by: Xuting Zhou <xutingz@nvidia.com>	2025-03-19 08:16:31 -07:00
James Liu	9e0186f352	[Feature] Support EAGLE 3 (#4247 )	2025-03-18 07:35:23 -07:00
HandH1998	f2ab37e500	[Doc] add doc for quantization w8a8_fp8 or w8a8_int8 (#4495 )	2025-03-17 02:25:00 -07:00
Xihuai Wang	927ca935a7	Constraint Decoding: Tool call with text (#4067 )	2025-03-17 01:06:46 -07:00
mlmz	452db50808	Constraint Decoding: Set xgrammar as the default grammar backend (#4386 )	2025-03-16 18:53:43 -07:00
Wang Ran (汪然)	22c96f78a6	typos: Update sampling_params.md (#4391 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2025-03-15 16:40:18 -07:00
Chayenne	e1a5e7e47d	docs: hot fix torch compile cache (#4442 )	2025-03-14 19:05:59 -07:00
yang_zcybb	ad46550d25	[Doc] Fix typo in backend/sampling_params (#3835 ) Co-authored-by: yangzhice.124 <yangzhice.124@bytedance.com>	2025-03-12 22:12:14 -07:00
Jun Liu	14344caa38	[docs] Update outdated description about `torch.compile` (#3844 )	2025-03-12 22:09:38 -07:00
William	0a59a4657a	Fix the doc of FR-Spec (#4295 )	2025-03-12 21:22:50 -07:00
Peter Pan	016033188c	docs: add parameter --log-requests-level (#4335 )	2025-03-12 21:19:37 -07:00
Ke Bao	3a08f54638	Update MTP doc (#4290 )	2025-03-11 00:46:55 -07:00
Xihuai Wang	6eec3cdce6	docs(reasoning content): 📝 deepseek-r1 parser support qwq (#4124 )	2025-03-09 04:14:50 +00:00
Michael Yao	d557319a8b	[Docs] Fix links and grammar issues (#4162 )	2025-03-06 23:14:18 -08:00
Chayenne	9854a18a51	Hot fix small vocal eagle in docs (#4154 ) Co-authored-by: ybyang <ybyang7@iflytek.com>	2025-03-06 15:13:26 -08:00
Chayenne	ebddb65aed	Docs: add torch compile cache (#4151 ) Co-authored-by: ybyang <ybyang7@iflytek.com>	2025-03-06 14:27:09 -08:00
simveit	8f0b63139e	Docs: improve EAGLE docs (#4038 )	2025-03-05 22:40:21 -08:00
samzong	d2d0d061d9	fix cross-reference error and spelling mistakes (#4101 ) Signed-off-by: samzong <samzong.lu@gmail.com>	2025-03-05 16:39:02 -08:00
Qiaolin Yu	357671e216	Add examples for server token-in-token-out (#4103 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-03-05 13:16:31 -08:00
Baizhou Zhang	fc91d08a8f	[Revision] Add fast decode plan for flashinfer mla (#4012 )	2025-03-05 11:20:41 -08:00
Qubitium-ModelCloud	56a724eba3	[QUANT] Add GPTQModel Dynamic Quantization + `lm_head` Quantization (#3790 ) Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>	2025-03-05 01:11:00 -08:00
Mick	583d6af71b	example: add vlm to token in & out example (#3941 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-03-04 22:18:26 -08:00
Qiaolin Yu	4725e3f652	Add examples for returning hidden states when using the server (#4074 )	2025-03-04 19:31:50 -08:00
Xihuai Wang	95575aa76a	Reasoning parser (#4000 ) Co-authored-by: Lucas Pickup <lupickup@microsoft.com>	2025-03-03 21:16:36 -08:00
Chayenne	146ac8df07	Add examples in sampling parameters (#4039 )	2025-03-03 13:04:32 -08:00
Chayenne	2796fbb53d	Docs: Fix sampling parameter (#4034 )	2025-03-03 09:32:36 -08:00
Lianmin Zheng	935cda944b	Misc clean up; Remove the support of jump forward (#4032 )	2025-03-03 07:02:14 -08:00
Lianmin Zheng	ac2387279e	Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>	2025-03-03 00:12:04 -08:00

1 2 3

133 Commits