sglang

Author	SHA1	Message	Date
Lianmin Zheng	01bdbf7f80	Improve structured outputs: fix race condition, server crash, metrics and style (#6188 )	2025-05-11 08:36:16 -07:00
applesaucethebun	2ce8793519	Add typo checker in pre-commit (#6179 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-11 12:55:00 +08:00
Lianmin Zheng	de167cf5fa	Fix request abortion (#6184 )	2025-05-10 21:54:46 -07:00
fzyzcjy	cef91b1ed7	[PD] Add control to slow down a server (#5572 )	2025-05-08 01:03:08 -07:00
fzyzcjy	b6cf3532b5	Tiny refactor ModelConfig.from_server_args (#5219 )	2025-05-08 01:02:43 -07:00
Liangsheng Yin	a3e4e9bf9e	Better PD initialization (#5751 )	2025-05-07 01:12:57 +08:00
Zhiqiang Xie	f8e460930a	Fix prefill OOM error in the case of large page size (#5081 )	2025-05-05 16:02:55 -07:00
xm:D	3409aaab32	Support InternVL3 (#5350 ) Co-authored-by: Mick <mickjagger19@icloud.com> Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-05-01 22:38:59 -07:00
Ying Sheng	11383cec3c	[PP] Add pipeline parallelism (#5724 )	2025-04-30 18:18:07 -07:00
Chang Su	28b26dbf48	[Bugfix]: fix missing queue_time_start for requests from grammar_queue (#5696 )	2025-04-29 17:31:44 -07:00
Lianmin Zheng	3029889cb4	Turn on overlap scheduler for multimodal models (#5771 )	2025-04-27 23:45:09 -07:00
Liangsheng Yin	40d9b8acce	Improve overlap scheduling (#5788 )	2025-04-28 11:19:16 +08:00
IAN	11e27d0926	[PD]: Support Muti Prefill in one node (#5704 ) Co-authored-by: shuaills <shishuaiuoe@gmail.com>	2025-04-26 00:30:47 +08:00
Liangsheng Yin	c55550cbf0	[PD] Better logs (#5715 )	2025-04-25 17:25:45 +08:00
Byron Hsu	bf98d2e377	[PD] Support prefill overlap + Ensure no race condition (#5609 )	2025-04-21 12:12:56 -07:00
Byron Hsu	e65b9f21e3	[PD] Support decode overlap schedule (#5608 )	2025-04-21 12:06:16 -07:00
Zhiqiang Xie	70645f4d7d	upstream hicache fixes (#5570 )	2025-04-20 23:08:30 -07:00
fzyzcjy	1195182040	Tiny add Engine.flush_cache API (#5241 )	2025-04-20 18:15:03 -07:00
fzyzcjy	f6a71139a8	Make profiler output file names consistent (#5548 )	2025-04-18 22:57:11 -07:00
Cheng Wan	6aca583420	Fix several minor issues in PD disaggregation (#5444 )	2025-04-15 23:04:41 -07:00
ybyang	dd83e7e9c3	[Bug fix] need record start time in pd mode (#5425 )	2025-04-16 10:11:16 +08:00
shangmingc	ffde65a094	[PD] Fix dynamic port support and MLA buffer for Mooncake (#5415 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> Co-authored-by: ybyang <ybyang7@iflytek.com>	2025-04-15 19:29:31 +08:00
Byron Hsu	a9499885e9	[PD] Add transfer backend abstraction (#5328 )	2025-04-14 01:39:39 +08:00
Liangsheng Yin	f765579046	Fix typo: infight -> inflight (#5357 )	2025-04-14 01:25:30 +08:00
Mick	34ef6c8135	[VLM] Adopt fast image processor by default (#5065 )	2025-04-11 21:46:58 -07:00
Cheng Wan	038bc5d521	Support `--enable-llama4-multimodal` (#5254 )	2025-04-11 01:24:14 -07:00
Ke Bao	1078396f47	Update deps for mllama4 (#5215 )	2025-04-10 09:12:44 -07:00
Teng Ma	4c31ae9f6d	[PD] Support KV transfer with mooncake (#4880 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> Co-authored-by: Shangming Cai <caishangming@linux.alibaba.com> Co-authored-by: Xuchun Shang <xuchun.shang@linux.alibaba.com> Co-authored-by: shangmingc <csmthu@gmail.com>	2025-04-10 14:23:23 +08:00
Stefan He	5db37c8626	[metrics] Add in queue metrics (#4444 )	2025-04-09 17:19:27 -07:00
fzyzcjy	61970b08d8	Let `bench_one_batch` support `enable_dp_attention` (#4058 )	2025-04-08 23:44:25 -07:00
mlmz	7c5658c189	feat: disable grammar restrictions within reasoning sections (#4984 ) Co-authored-by: tianhaoyu <thy@mail.ecust.edu.cn> Co-authored-by: DarkSharpness <2040703891@qq.com>	2025-04-07 21:46:47 -07:00
Zhiqiang Xie	e119f04215	Large page size aligned hierarchical caching (#4581 )	2025-04-01 22:38:15 -07:00
Mick	5cb552b1d4	refactor: multimodal data (#4754 )	2025-03-31 09:57:51 -07:00
Lianmin Zheng	b26bc86b36	Support page size > 1 + eagle (#4908 )	2025-03-30 00:46:23 -07:00
Fr4nk1in	c483377ed7	Fix wrong variable name when stopping memory profile (#4772 )	2025-03-28 10:35:02 -07:00
fzyzcjy	8c04f0f2e1	Support with_stack and record_shapes in profiler (#4740 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2025-03-27 23:01:42 -07:00
fzyzcjy	53a2c3b466	Support controlling nsys start and end range programmatically (#4688 )	2025-03-27 22:21:13 -07:00
XinyuanTong	42a45df043	[Fix] `self.worker` assignment in `TpModelWorker` and refactor references (#4788 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-03-27 20:28:38 -07:00
tarinkk	7f19e083c1	Support (1 <= dp < tp) in the dp attention in DeepEP (#4770 ) Co-authored-by: Cheng Wan <cwan39@gatech.edu>	2025-03-27 17:09:35 -07:00
Xiaoyu Zhang	04e3ff6975	Support compressed tensors fp8w8a8 (#4743 )	2025-03-26 13:21:25 -07:00
fzyzcjy	26f07294f1	Warn users when release_memory_occupation is called without memory saver enabled (#4566 )	2025-03-26 00:18:14 -07:00
fzyzcjy	eb934bdf3b	Fix test_expert_distribution failure (#4752 )	2025-03-25 01:17:03 -07:00
yuhsaun-t	199bb01d00	Add endpoints to dump selected expert ids (#4435 ) Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>	2025-03-24 21:34:19 -07:00
Mick	1e86457c90	model: Minicpmo (#3023 )	2025-03-24 20:08:40 -07:00
Byron Hsu	c7c7dbebbe	[PD] Release initial code (#4654 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: Ying1123 <sqy1415@gmail.com> Co-authored-by: merrymercy <lianminzheng@gmail.com> Co-authored-by: makro Co-authored-by: dhou-xai	2025-03-21 14:47:47 -07:00
Zhiqiang Xie	a98290aea3	Unit test for Hierarchical Caching (#4486 )	2025-03-17 17:45:00 -07:00
Lianmin Zheng	5493c3343e	Fix data parallel + tensor parallel (#4499 )	2025-03-17 05:13:16 -07:00
JieXin Liang	0212d2e288	[Fix] use `torch.inference_mode()` instead of `torch.no_grad()` (#4372 )	2025-03-16 22:54:16 -07:00
Ying Sheng	1b859295f4	[Eagle] Remove the greedy branch and some redundant code (#4363 ) Co-authored-by: Sehoon Kim <sehoon@x.ai>	2025-03-16 02:48:55 -07:00
wangyu	1ce4878d31	feat(remote_model): support variable remote backend for model loader (#3964 ) Signed-off-by: wangyu <wangyu.steph@bytedance.com>	2025-03-14 00:40:44 -07:00

1 2 3 4 5

246 Commits