sglang

Author	SHA1	Message	Date
Lianmin Zheng	5493c3343e	Fix data parallel + tensor parallel (#4499 )	2025-03-17 05:13:16 -07:00
萝卜菜	d6d21640d3	[Feature] Support Deepseek-VL2 (#2798 ) Co-authored-by: Edenzzzz <wtan45@wisc.edu> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: Yi Zhang <1109276519@qq.com>	2025-03-16 23:07:59 -07:00
Mick	9d02bb3e2a	Urgent model support: support gemma-3-it (#4424 )	2025-03-16 17:37:32 -07:00
lukec	a53fe428f9	Support FlashMLA backend (#4472 ) Co-authored-by: yinfan98 <1106310035@qq.com>	2025-03-16 09:07:06 -07:00
Ying Sheng	1b859295f4	[Eagle] Remove the greedy branch and some redundant code (#4363 ) Co-authored-by: Sehoon Kim <sehoon@x.ai>	2025-03-16 02:48:55 -07:00
JieXin Liang	1a3fa75f2f	[Fix] use `torch.cat` instead of `torch.concat` to prevent entering the `Autograd` backends. (#4466 )	2025-03-16 00:02:47 -07:00
wangyu	1ce4878d31	feat(remote_model): support variable remote backend for model loader (#3964 ) Signed-off-by: wangyu <wangyu.steph@bytedance.com>	2025-03-14 00:40:44 -07:00
Lianmin Zheng	8e66fbecee	Improve DP attention (#4390 ) Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: SangBin Cho <rkooo567@gmail.com>	2025-03-13 08:23:56 -07:00
Lianmin Zheng	45de89719c	Revert "[XPU][CPU] Enable the native path of DeepSeek" (#4367 )	2025-03-12 23:45:52 -07:00
Meng, Hengyu	71046fcd71	[XPU][CPU] Enable the native path of DeepSeek (#4086 ) Co-authored-by: Zhang, Liangang <liangang.zhang@intel.com>	2025-03-12 22:26:29 -07:00
Lianmin Zheng	c76040e31b	Support page size > 1 (#4356 )	2025-03-12 22:22:39 -07:00
Lianmin Zheng	e35a93fa8a	Move output processing logic from scheduler.py into a separate file (#4354 )	2025-03-12 16:21:49 -07:00
Lianmin Zheng	d40ee62b5d	Update nightly tests (#4352 )	2025-03-12 15:36:13 -07:00
Yineng Zhang	d1da58e275	unify is_cuda and is_hip (#4321 )	2025-03-11 18:12:56 -07:00
Lianmin Zheng	00d25a7f5e	Fix quantization and nightly tests (#4258 )	2025-03-10 03:06:21 -07:00
Lianmin Zheng	08c4d764a5	lazy import attn backends (#4200 )	2025-03-08 00:41:35 -08:00
Lianmin Zheng	d4017a6b63	[EAGLE] many fixes for eagle (#4195 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: Sehoon Kim <sehoon@x.ai>	2025-03-07 22:12:13 -08:00
Yineng Zhang	eb61f5c9af	Revert "ROCm: Flex Attention Enablement with custom backends (#4178 )" (#4186 )	2025-03-07 10:27:52 -08:00
HAI	0beea4503f	ROCm: Flex Attention Enablement with custom backends (#4178 ) Co-authored-by: linsun12 <linsun12@amd.com>	2025-03-07 04:38:53 -08:00
Lianmin Zheng	bc1534ff32	Fix a draft model accuracy bug in eagle; support step=1; return logprob in eagle (#4134 ) Co-authored-by: Sehoon Kim <kssteven418@gmail.com> Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: Sehoon Kim <sehoon@x.ai>	2025-03-06 06:13:59 -08:00
Lianmin Zheng	98c73d71cb	[Minor] make the `__init__` function of model_runner.py shorter (#4132 )	2025-03-06 01:51:12 -08:00
Zhiqiang Xie	aee30630d8	Add a pointer to the real KV cache pool (#4113 )	2025-03-05 21:39:07 -08:00
Ke Bao	ef9d3b3c2c	Fix triton kernel illegal memory issue for eagle (#4100 )	2025-03-05 11:23:53 -08:00
Baizhou Zhang	fc91d08a8f	[Revision] Add fast decode plan for flashinfer mla (#4012 )	2025-03-05 11:20:41 -08:00
Ying Sheng	d3d4d76758	[Eagle] Refactor eagle speculative decoding (#3986 ) Co-authored-by: Ke Bao <ISPObaoke@163.com>	2025-03-05 08:06:07 -08:00
Lianmin Zheng	e074d84e5b	[Minor] more code cleanup (#4077 )	2025-03-04 21:23:47 -08:00
Chen Shengzhi	61261b3996	[XCCL] Use xccl for xpu backend since xccl is ready in latest PyTorch. (#3954 )	2025-03-04 04:05:56 -08:00
Ke Bao	9fafa62db7	Share target model embed and head weights for nextn (#4033 )	2025-03-03 13:30:04 -08:00
Lianmin Zheng	935cda944b	Misc clean up; Remove the support of jump forward (#4032 )	2025-03-03 07:02:14 -08:00
Lianmin Zheng	66301e124f	Improve code styles (#4021 )	2025-03-03 03:20:23 -08:00
Lianmin Zheng	ac2387279e	Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>	2025-03-03 00:12:04 -08:00
Lianmin Zheng	9e1014cf99	Revert "Add fast decode plan for flashinfer mla" (#4008 )	2025-03-02 19:29:10 -08:00
Baizhou Zhang	fa56106731	Add fast decode plan for flashinfer mla (#3987 )	2025-03-02 19:16:37 -08:00
Qiaolin Yu	40782f05d7	Refactor: Move return_hidden_states to the generate input (#3985 ) Co-authored-by: Beichen-Ma <mabeichen12@gmail.com>	2025-03-01 17:51:29 -08:00
Baizhou Zhang	90a4b7d98a	[Feature]Support ragged prefill in flashinfer mla backend (#3967 ) Co-authored-by: Yineng Zhang <me@zhyncs.com> Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>	2025-02-28 18:13:56 -08:00
fzyzcjy	e3e0bc50a9	[Feature] SPMD for SGLang + Verl (#3852 )	2025-02-28 09:53:10 -08:00
Qiaolin Yu	d6898dd253	Add return hidden state in the native API (#3897 ) Co-authored-by: Beichen-Ma <mabeichen12@gmail.com> Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-02-26 22:06:54 -08:00
who who who	4606e2a3fe	Bug: fix capture_bs (#3857 )	2025-02-25 08:40:35 -08:00
Shenggui Li	c0bb9eb3b3	[improve] made timeout configurable (#3803 )	2025-02-25 00:26:08 -08:00
Baizhou Zhang	b110084654	Refactor flashinfer logic for deepseek v3 and fix accuracy bug (#3785 )	2025-02-24 04:07:25 -08:00
Yineng Zhang	714f3e6362	feat: support flashinfer mla with prefix cache (#3643 )	2025-02-18 02:06:43 +08:00
Yineng Zhang	dfce926921	fix high qps crash when enable mtp (#3592 ) Co-authored-by: ispobock <ispobaoke@hotmail.com>	2025-02-15 23:11:28 +08:00
Yineng Zhang	70f894b810	feat: support flashinfer mla attention for deepseek v3 (#3550 )	2025-02-14 08:50:14 +08:00
HAI	d81ac4434e	MI30x: More graph captures for larger batch sizes and concurrencies (#3420 )	2025-02-12 03:04:38 +08:00
Jackmin801	5f0e7de339	[Feat] Return hidden states (experimental) (#3364 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-02-10 15:54:37 -08:00
Yineng Zhang	bc72e5bd32	add cuda graph capture failure possible solution (#3430 )	2025-02-09 22:57:11 +08:00
Yineng Zhang	fad315cb8e	fix EAGLE 2 non greedy case (#3407 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2025-02-09 07:28:34 +08:00
Baizhou Zhang	70817a7eae	[Feature] Define backends and add Triton backend for Lora (#3161 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2025-02-03 22:09:13 -08:00
Yineng Zhang	013021b6a1	refactor EAGLE 2 (#3269 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com> Co-authored-by: merrymercy <lianminzheng@gmail.com> Co-authored-by: Ying1123 <sqy1415@gmail.com>	2025-02-03 20:52:30 +08:00
Yineng Zhang	4eb4b401cc	update and simplify CustomOp (#3249 )	2025-02-01 18:56:44 +08:00

1 2 3 4 5 ...

258 Commits