sglang

Author	SHA1	Message	Date
Chang Su	16f69b1f65	feat: Improve Mistral and Qwen25 function call parsing (#6597 )	2025-05-25 23:07:23 -07:00
Yi Zhang	65f091310c	refactor qwen moe code, use communicator to support tp+dp (#6581 )	2025-05-25 23:01:10 -07:00
Yineng Zhang	fc419b62e8	Revert "Tiny fix lint CI does not trigger on master (#6609 )" (#6610 )	2025-05-25 22:52:34 -07:00
Yineng Zhang	7eb9d8e594	chore: upgrade transformers 4.52.3 (#6575 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2025-05-25 22:49:58 -07:00
fzyzcjy	84147254c9	Tiny fix lint CI does not trigger on master (#6609 )	2025-05-25 22:47:03 -07:00
fzyzcjy	6bebef60a7	Support accurate length control for bench serving (#6594 )	2025-05-25 22:46:23 -07:00
fzyzcjy	25be63d0b2	Auto handle PD disaggregation in bench_serving (#6587 ) Co-authored-by: yizhang2077 <1109276519@qq.com>	2025-05-25 22:41:27 -07:00
fzyzcjy	d502dae0f0	Tiny change killall_sglang.sh (#6596 )	2025-05-25 22:36:51 -07:00
fzyzcjy	93e53f6e0b	Logging and minor fixes to two batch overlap and EPLB (#6595 )	2025-05-25 22:36:40 -07:00
fzyzcjy	a191a0e47c	Improve performance of two batch overlap in some imbalanced cases (#6593 )	2025-05-25 22:36:18 -07:00
fzyzcjy	8c7279c24e	Fix profiling will crash the server when using num_steps (#6586 )	2025-05-25 22:36:02 -07:00
fzyzcjy	0ca1811715	Support fake perfectly balanced EP dispatch algorithm (#6571 )	2025-05-25 22:35:51 -07:00
fzyzcjy	2c3a6fe1de	Fix bench_serving does not support changing warmup requests (#6439 )	2025-05-25 22:35:36 -07:00
wangxiyu191	8b33d8df90	[PD] Fix prefill_servers in mini_lb (#6527 )	2025-05-26 10:38:41 +08:00
simveit	e235be16fe	Fix some issues with current docs. (#6588 )	2025-05-26 01:04:34 +08:00
fzyzcjy	5ccf8fe1a0	Hint users when weight update timeouts (#6570 )	2025-05-25 09:13:17 -07:00
Shenggui Li	3f23d8cdf1	added support for tied weights in qwen pipeline parallelism (#6546 )	2025-05-25 00:00:56 -07:00
Chao Yang	1a39979993	Sgl-router Prometheus metrics endpoint and usage track metrics (#6537 )	2025-05-24 22:28:15 -07:00
Lifu Huang	022012aae8	Support Phi-4 Multi-Modal (text + vision only) (#6494 )	2025-05-24 21:43:38 -07:00
Chang Su	681e7af32b	[OAI] Support non-normalized logprobs in OpenAI server (#5961 )	2025-05-24 21:35:55 -07:00
Xinyuan Tong	681fdc264b	Refactor vlm embedding routine to use precomputed feature (#6543 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-05-24 18:39:21 -07:00
fzyzcjy	0d47788025	Support overlapping two batches (#4068 )	2025-05-24 17:39:07 -07:00
fzyzcjy	f456037396	Utilize static dispatching for communicator (#6577 )	2025-05-24 17:34:35 -07:00
fzyzcjy	b2388433be	Add back DeepSeek non-TBO branches (#6578 )	2025-05-24 17:34:00 -07:00
fzyzcjy	a38376fa99	Refactor attention into multiple stages (#6477 )	2025-05-24 17:33:25 -07:00
kk	7a5e6ce1cb	Fix GPU OOM (#6564 ) Co-authored-by: michael <michael.zhang@amd.com>	2025-05-24 16:38:39 -07:00
Sai Enduri	24c035f2e3	Temporarily disable MI325x 8 gpu testing. (#6576 )	2025-05-24 16:37:22 -07:00
Yineng Zhang	7e257cd666	chore: bump v0.4.6.post5 (#6566 )	2025-05-24 00:48:05 -07:00
fzyzcjy	c4831e2fcf	Fix accuracy is zero when enabling moe-dense-tp-size as in large scale EP (#6567 )	2025-05-24 00:27:10 -07:00
Neo	2e37fa07ba	[FIX]remove ServerArgs duplicate code (#6485 )	2025-05-23 22:54:41 -07:00
Byron Hsu	2d831c6ef9	[PD] Support structured output (#6560 )	2025-05-23 21:49:00 -07:00
Chang Su	ed0c3035cd	feat(Tool Calling): Support `required` and specific function mode (#6550 )	2025-05-23 21:00:37 -07:00
Yi Zhang	e6f113569e	support eplb for qwen3 (#6533 )	2025-05-23 18:31:30 -07:00
Chang Su	7b02c32679	[Bugfix](gemma3_mm): handle flatten_batch constraint for multiple images (#6562 )	2025-05-23 18:11:54 -07:00
miter	fefa19fec0	Update cmdline --enable-dp-attention help string for Qwen 2/3 Moe models. (#6524 ) Signed-off-by: miter <miterv@outlook.com>	2025-05-23 15:20:21 -07:00
Shi Shuai	9c574585b3	fix: remove content=none test when tool called (#6347 )	2025-05-23 15:12:55 -07:00
Byron Hsu	8233cc10fd	[PD] Support logprob & Add failure test (#6558 )	2025-05-23 14:29:20 -07:00
HandH1998	1b2e8f76d9	[2/2] Support Qserve (#6521 )	2025-05-23 12:39:18 -07:00
Byron Hsu	d2e0881a34	[PD] support spec decode (#6507 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com>	2025-05-23 12:03:05 -07:00
Li Hui	2f42749184	Fix topk inference performance reduce (#6474 )	2025-05-23 02:58:31 -07:00
YanbingJiang	d8189660a9	Update sgl-kernel UTs for activation/topk/norm/rope kernels (#6452 )	2025-05-23 02:03:15 -07:00
Chunyuan WU	3ded6235c9	Add fp8 fused_experts kernel for CPU in sgl-kernel and add UT (#6404 )	2025-05-23 02:01:55 -07:00
blzheng	4ba1eea83f	Add fp8 qkv_proj_with_rope kernel for CPU in sgl-kernel and add UT (#6493 )	2025-05-23 00:14:46 -07:00
Chang Su	4685fbb888	[VLM] Support chunk prefill for VLM (#6355 ) Co-authored-by: yizhang2077 <1109276519@qq.com>	2025-05-22 20:32:41 -07:00
Byron Hsu	0a4fc73b48	[PD] Fix failure abort (#6535 )	2025-05-22 20:32:03 -07:00
Yineng Zhang	a6970a17f3	misc: fix accept_length (#6536 )	2025-05-22 14:27:10 -07:00
ryang	a6ae3af15e	Support XiaomiMiMo inference with mtp (#6059 )	2025-05-22 14:14:49 -07:00
Yineng Zhang	0b07c4a99f	chore: upgrade sgl-kernel v0.1.4 (#6532 )	2025-05-22 13:28:16 -07:00
lukec	fc0e3b9174	Support qwen3 deepep (#6120 )	2025-05-22 11:04:45 -07:00
Yineng Zhang	d71f3f0a2a	chore: bump sgl-kernel v0.1.4 (#6522 )	2025-05-22 09:47:42 -07:00

1 2 3 4 5 ...

3415 Commits