sglang

Author	SHA1	Message	Date
Chang Su	7b02c32679	[Bugfix](gemma3_mm): handle flatten_batch constraint for multiple images (#6562 )	2025-05-23 18:11:54 -07:00
miter	fefa19fec0	Update cmdline --enable-dp-attention help string for Qwen 2/3 Moe models. (#6524 ) Signed-off-by: miter <miterv@outlook.com>	2025-05-23 15:20:21 -07:00
Shi Shuai	9c574585b3	fix: remove content=none test when tool called (#6347 )	2025-05-23 15:12:55 -07:00
Byron Hsu	8233cc10fd	[PD] Support logprob & Add failure test (#6558 )	2025-05-23 14:29:20 -07:00
HandH1998	1b2e8f76d9	[2/2] Support Qserve (#6521 )	2025-05-23 12:39:18 -07:00
Byron Hsu	d2e0881a34	[PD] support spec decode (#6507 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com>	2025-05-23 12:03:05 -07:00
Li Hui	2f42749184	Fix topk inference performance reduce (#6474 )	2025-05-23 02:58:31 -07:00
YanbingJiang	d8189660a9	Update sgl-kernel UTs for activation/topk/norm/rope kernels (#6452 )	2025-05-23 02:03:15 -07:00
Chunyuan WU	3ded6235c9	Add fp8 fused_experts kernel for CPU in sgl-kernel and add UT (#6404 )	2025-05-23 02:01:55 -07:00
blzheng	4ba1eea83f	Add fp8 qkv_proj_with_rope kernel for CPU in sgl-kernel and add UT (#6493 )	2025-05-23 00:14:46 -07:00
Chang Su	4685fbb888	[VLM] Support chunk prefill for VLM (#6355 ) Co-authored-by: yizhang2077 <1109276519@qq.com>	2025-05-22 20:32:41 -07:00
Byron Hsu	0a4fc73b48	[PD] Fix failure abort (#6535 )	2025-05-22 20:32:03 -07:00
Yineng Zhang	a6970a17f3	misc: fix accept_length (#6536 )	2025-05-22 14:27:10 -07:00
ryang	a6ae3af15e	Support XiaomiMiMo inference with mtp (#6059 )	2025-05-22 14:14:49 -07:00
Yineng Zhang	0b07c4a99f	chore: upgrade sgl-kernel v0.1.4 (#6532 )	2025-05-22 13:28:16 -07:00
lukec	fc0e3b9174	Support qwen3 deepep (#6120 )	2025-05-22 11:04:45 -07:00
Yineng Zhang	d71f3f0a2a	chore: bump sgl-kernel v0.1.4 (#6522 )	2025-05-22 09:47:42 -07:00
shangmingc	58f10679e1	Fix missing http status import for PD failure handler (#6520 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-05-22 15:23:54 +08:00
fzyzcjy	7a80f56513	Support dynamically rebalancing experts using EPLB (#6469 )	2025-05-21 23:13:21 -07:00
fzyzcjy	9484eba4ad	Support logging expert balancedness metrics (#6482 )	2025-05-21 23:05:33 -07:00
Zilin Zhu	e9feb48838	[RL] Remove the w13 weight_scale and input_scale for UnquantizedEPMoE… (#6308 )	2025-05-21 22:03:15 -07:00
fzyzcjy	fc992a09f9	Support updating expert locations dynamically (#6388 )	2025-05-21 21:59:33 -07:00
Yuan Luo	121f92c583	Add main for merge state tests (#6492 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-05-21 21:56:25 -07:00
Byron Hsu	3bde101099	[PD] Abort request if transfer fails (#6504 )	2025-05-21 21:44:25 -07:00
Byron Hsu	7513558074	[PD] Add doc and simplify sender.send (#6019 )	2025-05-21 21:22:21 -07:00
HandH1998	4d643f6c7a	[1/2] Support Qserve (#6457 ) Co-authored-by: yych0745 <1398089567@qq.com> Co-authored-by: sleepcoo <sleepcoo@gmail.com>	2025-05-21 19:48:59 -07:00
Ke Bao	6ce0ed073b	Apply constraint grammar to EAGLE (#6499 ) Co-authored-by: merrymercy <lianminzheng@gmail.com>	2025-05-21 17:18:41 -07:00
fzyzcjy	969660c762	Recover from corrupted cache file in bench serving (#6510 )	2025-05-21 17:13:54 -07:00
Xinyuan Tong	16d4f6801b	doc: Update README.md with adding deepwiki badge to enable weekly auto-refresh (#6508 )	2025-05-21 16:27:34 -07:00
Kyungmin Lee	ada268fd05	fix: EXAONE when using tie_word_embeddings (#5759 )	2025-05-21 11:30:04 -07:00
blzheng	cfe48c5902	[CPU] Fix build issue (#6419 )	2025-05-21 11:17:10 -07:00
Baizhou Zhang	d4c038daed	[Fix]Fix capture fail bug for DeepSeek (#6275 )	2025-05-21 11:11:20 -07:00
fzyzcjy	55f6005f53	Fix bench_one_batch_server (#6503 )	2025-05-21 11:08:17 -07:00
fzyzcjy	7222e1dacc	Let bench_one_batch_server use sharegpt data to make expert distribution more natural (#5573 )	2025-05-21 02:08:43 -07:00
fzyzcjy	505eec4dc9	Tiny make Lint CI show diff (#6445 )	2025-05-21 02:06:25 -07:00
fzyzcjy	ccfe5c009d	Support redundant experts in expert parallel (#6461 )	2025-05-21 02:05:53 -07:00
fzyzcjy	a071dc4084	Tiny add stage assertions to DeepEPDispatcher to avoid misuse (#6467 )	2025-05-21 02:05:05 -07:00
fzyzcjy	a40aecc5a3	Fix num_qps_per_rank computation when providing custom DeepEP configuration (#6468 )	2025-05-21 02:04:33 -07:00
fzyzcjy	d6e1d28c8a	Refactor DeepSeek attention dispatching (#6476 )	2025-05-21 02:03:39 -07:00
Zilin Zhu	7c347259ff	[RL] allow weight updation with dp attention enabled (#6311 )	2025-05-21 01:58:55 -07:00
Zilin Zhu	669caa0a3f	[router] support http2 in router (#6487 )	2025-05-21 01:42:45 -07:00
Jiajun Li	4024e1d2a8	Implement Siglip Vision model, and support BNB quantization for gemma3-mm (#5339 )	2025-05-20 23:53:46 -07:00
HAI	5c0b38f369	aiter attention-backend (default enabled on AMD/ROCm) (#6381 )	2025-05-20 22:52:41 -07:00
Yuan Luo	30ca18f423	Refactor group_concurrent_contiguous in NIXL (#6214 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-05-21 11:55:04 +08:00
Lianmin Zheng	03886917bd	Disable all two stream overlap on amd (#6475 )	2025-05-20 19:06:59 -07:00
Wenxuan Tan	66324895c6	[docs] Fix torch version (#6472 )	2025-05-20 10:53:14 -07:00
fzyzcjy	13feffd082	Fix master CI for DeepSeek (#6447 )	2025-05-20 00:31:42 -07:00
fzyzcjy	e98afbe042	Support dispatching logical to physical experts (#6385 )	2025-05-19 22:13:55 -07:00
JieXin Liang	69af3ec35f	[doc] add note for get_num_kv_splits in triton_backend (#6444 )	2025-05-19 21:40:21 -07:00
YanbingJiang	32cc66efa5	Update extend/decode attention kernel for CPU in sgl-kernel and add UTs (#6405 ) Co-authored-by: mingfeima <mingfei.ma@intel.com>	2025-05-19 21:23:17 -07:00

1 2 3 4 5 ...

3382 Commits