sglang

Author	SHA1	Message	Date
Zhiqiang Xie	a169b9f813	Fix oom error for large page size (#4913 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2025-03-30 21:34:21 -07:00
Baizhou Zhang	e62d60fe6d	[Fix] avoid stream sync and torch compile in prefill for fa3 backend (#4932 )	2025-03-30 13:53:44 -07:00
Lianmin Zheng	4ede6770cd	Fix retract for page size > 1 (#4914 )	2025-03-30 02:57:15 -07:00
Lianmin Zheng	b26bc86b36	Support page size > 1 + eagle (#4908 )	2025-03-30 00:46:23 -07:00
Stefan He	1b9175cb23	[FA3 Attn Backend] Remove Unnecessary Device Sync for FA3 (#4745 ) Co-authored-by: Yubo Wang <yubowang2019@gmail.com>	2025-03-27 00:45:11 -07:00
Mick	1e86457c90	model: Minicpmo (#3023 )	2025-03-24 20:08:40 -07:00
Mick	11577cedb7	refactor: bug fixes and refactor for vlm (#4661 )	2025-03-22 22:48:49 -07:00
Zhiqiang Xie	ecbfe58bb0	Bug fix for metrics counter (#4660 )	2025-03-22 13:39:21 -07:00
Byron Hsu	c7c7dbebbe	[PD] Release initial code (#4654 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: Ying1123 <sqy1415@gmail.com> Co-authored-by: merrymercy <lianminzheng@gmail.com> Co-authored-by: makro Co-authored-by: dhou-xai	2025-03-21 14:47:47 -07:00
Jinyan Chen	f44db16c8e	[Feature] Integrate DeepEP into SGLang (#4232 ) Co-authored-by: Cheng Wan <cwan39@gatech.edu> Co-authored-by: Xuting Zhou <xutingz@nvidia.com>	2025-03-19 08:16:31 -07:00
Mick	d373a48c98	fix: second_per_grid_ts should be used to get mrope position (#3682 )	2025-03-17 18:12:38 -07:00
萝卜菜	d6d21640d3	[Feature] Support Deepseek-VL2 (#2798 ) Co-authored-by: Edenzzzz <wtan45@wisc.edu> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: Yi Zhang <1109276519@qq.com>	2025-03-16 23:07:59 -07:00
Mick	9d02bb3e2a	Urgent model support: support gemma-3-it (#4424 )	2025-03-16 17:37:32 -07:00
lukec	a53fe428f9	Support FlashMLA backend (#4472 ) Co-authored-by: yinfan98 <1106310035@qq.com>	2025-03-16 09:07:06 -07:00
JieXin Liang	1a3fa75f2f	[Fix] use `torch.cat` instead of `torch.concat` to prevent entering the `Autograd` backends. (#4466 )	2025-03-16 00:02:47 -07:00
Qiaolin Yu	85d2365d33	Fix the output of hidden states after HTTP requests (#4269 )	2025-03-13 14:54:06 -07:00
Lianmin Zheng	c76040e31b	Support page size > 1 (#4356 )	2025-03-12 22:22:39 -07:00
Lianmin Zheng	e35a93fa8a	Move output processing logic from scheduler.py into a separate file (#4354 )	2025-03-12 16:21:49 -07:00
Zhiqiang Xie	10b544ae9b	Hierarchical Caching Refactoring and Fixing TP issue (#4082 )	2025-03-12 11:22:35 -07:00
Mick	ff2ce0b86f	refactor: move image processors to separate files (#4229 )	2025-03-11 12:35:35 -07:00
Baizhou Zhang	fc91d08a8f	[Revision] Add fast decode plan for flashinfer mla (#4012 )	2025-03-05 11:20:41 -08:00
Ying Sheng	d3d4d76758	[Eagle] Refactor eagle speculative decoding (#3986 ) Co-authored-by: Ke Bao <ISPObaoke@163.com>	2025-03-05 08:06:07 -08:00
Lianmin Zheng	935cda944b	Misc clean up; Remove the support of jump forward (#4032 )	2025-03-03 07:02:14 -08:00
Lianmin Zheng	66301e124f	Improve code styles (#4021 )	2025-03-03 03:20:23 -08:00
Lianmin Zheng	ac2387279e	Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>	2025-03-03 00:12:04 -08:00
Lianmin Zheng	9e1014cf99	Revert "Add fast decode plan for flashinfer mla" (#4008 )	2025-03-02 19:29:10 -08:00
Baizhou Zhang	fa56106731	Add fast decode plan for flashinfer mla (#3987 )	2025-03-02 19:16:37 -08:00
Qiaolin Yu	40782f05d7	Refactor: Move return_hidden_states to the generate input (#3985 ) Co-authored-by: Beichen-Ma <mabeichen12@gmail.com>	2025-03-01 17:51:29 -08:00
Baizhou Zhang	90a4b7d98a	[Feature]Support ragged prefill in flashinfer mla backend (#3967 ) Co-authored-by: Yineng Zhang <me@zhyncs.com> Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>	2025-02-28 18:13:56 -08:00
Qiaolin Yu	d6898dd253	Add return hidden state in the native API (#3897 ) Co-authored-by: Beichen-Ma <mabeichen12@gmail.com> Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-02-26 22:06:54 -08:00
Yineng Zhang	714f3e6362	feat: support flashinfer mla with prefix cache (#3643 )	2025-02-18 02:06:43 +08:00
Yineng Zhang	70f894b810	feat: support flashinfer mla attention for deepseek v3 (#3550 )	2025-02-14 08:50:14 +08:00
Jackmin801	5f0e7de339	[Feat] Return hidden states (experimental) (#3364 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-02-10 15:54:37 -08:00
Lianmin Zheng	53cef81587	Improve weight loading and code style (#3174 )	2025-01-27 03:00:41 -08:00
Lianmin Zheng	1dda8c5e4c	Return more infos for computing average acceptance length (#3152 )	2025-01-26 04:51:54 -08:00
Lianmin Zheng	d1a0863251	Add a test case for cached_tokens (#3145 )	2025-01-26 01:39:28 -08:00
Lianmin Zheng	3d8f1c9bcf	Use int64 as indices for set_kv_buffer (#3039 )	2025-01-21 19:46:09 -08:00
996_icu	b730aa6b9e	[EAGLE] Fix some boundary situation when retract reqs and req's max token = 1 (#2939 ) Co-authored-by: josephyou <josephyou@tencent.com>	2025-01-20 17:46:43 -08:00
Hongpeng Guo	583697cd71	[Enhancement] Custom Logit Processor Improvement (#2998 ) Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>	2025-01-20 02:00:35 -08:00
Hongpeng Guo	e403d23757	[Feature] Add sampler custom logits processor (#2396 ) Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>	2025-01-19 14:46:53 -08:00
Lianmin Zheng	7906d1d298	Remove the unused write_with_records (#2972 )	2025-01-18 20:20:23 -08:00
Chang Su	4d4cdb3fe7	Frontend: better error message handling for FINISH_ABORT in scheduler.py (#2956 )	2025-01-18 19:37:30 -08:00
Yang Zheng	2bd18e2d76	Memory pool: Minor optimize to avoid to (#2901 )	2025-01-18 19:35:12 -08:00
Mick	3d93f84a00	[Feature] Support minicpmv v2.6 (#2785 ) Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: yizhang2077 <1109276519@qq.com>	2025-01-18 14:14:19 -08:00
Chunyuan WU	63051738a9	Enable CPU device on SGLang (#2806 )	2025-01-16 21:22:53 -08:00
Lianmin Zheng	bc6915e3b9	Improve type annotation and styles (#2926 )	2025-01-16 12:51:11 -08:00
Lianmin Zheng	8b6ce52e92	Support multi-node DP attention (#2925 ) Co-authored-by: dhou-xai <dhou@x.ai>	2025-01-16 11:15:00 -08:00
Lianmin Zheng	f65c13b559	Remove normalized_prompt_logprobs from the engine to make code easier to maintain (#2902 )	2025-01-15 04:54:14 -08:00
Lianmin Zheng	b8574f6953	Clean up eagle code (#2756 )	2025-01-06 14:54:18 -08:00
Lianmin Zheng	ad20b7957e	Eagle speculative decoding part 3: small modifications to the general scheduler (#2709 ) Co-authored-by: kavioyu <kavioyu@tencent.com>	2025-01-02 02:09:08 -08:00

1 2 3 4

184 Commits