sglang

Author	SHA1	Message	Date
Xinyuan Tong	cf9815ba69	[Refactor] Multimodal data processing for VLM (#6659 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-06-04 11:22:33 -07:00
Cheng Wan	8a5480528d	[Refactor] Rename `n_share_experts_fusion` as `num_fused_shared_experts` (#6735 )	2025-06-03 17:48:24 -07:00
pansicheng	27e327b415	fix new_page_count_next_decode (#6671 )	2025-06-02 22:48:52 -07:00
fzyzcjy	df7f61ee7d	Speed up rebalancing when using non-static dispatch algorithms (#6812 )	2025-06-02 11:18:17 -07:00
fzyzcjy	ef21729c1d	Fix profiles do not have consistent names (#6811 )	2025-06-02 11:17:22 -07:00
fzyzcjy	6d7b6696d4	Tiny fix EPLB assertion about rebalancing period and recorder window size (#6813 )	2025-06-02 11:13:33 -07:00
fzyzcjy	6376b632eb	Tiny log prefill time (#6780 )	2025-06-02 10:28:27 -07:00
Lianmin Zheng	20fd53b8f6	Correctly abort the failed grammar requests & Improve the handling of abort (#6803 )	2025-06-01 19:00:07 -07:00
Lianmin Zheng	2d72fc47cf	Improve profiler and integrate profiler in bench_one_batch_server (#6787 )	2025-05-31 15:53:55 -07:00
YanbingJiang	888cb175a6	Add intel_amx backend for Radix Attention for CPU (#6408 ) Co-authored-by: Chunyuan WU <chunyuan.wu@intel.com> Co-authored-by: Thien Tran <gau.nernst@yahoo.com.sg>	2025-05-30 21:37:42 -07:00
fzyzcjy	2c3b71d678	Improve EPLB logical to physical dispatch map (#6727 )	2025-05-29 19:23:54 -07:00
fzyzcjy	3ab7d9b55e	Support picking variants of EPLB algorithms (#6728 )	2025-05-29 08:12:01 -07:00
fzyzcjy	7e5071c92a	Super tiny enable sole usage of expert distribution metrics and update doc (#6680 )	2025-05-29 08:11:38 -07:00
Liangsheng Yin	78689d3393	PD Rust LB (PO2) (#6437 ) Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>	2025-05-29 20:50:10 +08:00
JieXin Liang	2163586e63	[feat] triton kernel for get_last_loc (#6676 )	2025-05-28 23:10:28 -07:00
fzyzcjy	87068b5cc7	Support gathering expert distribution details (#6665 )	2025-05-27 15:32:59 -07:00
Lifu Huang	79a39ac0cc	follow-up: move Idefics2 to a shared location to eliminate unexpected dependency. (#6603 )	2025-05-26 19:23:59 -07:00
fzyzcjy	5c7aa00976	Fix EPLB algorithm fail to run when using 3 nodes for prefill (#6629 )	2025-05-26 08:43:24 -07:00
Yi Zhang	14d1075f2c	fix qwen3moe eplb prefill bug (#6617 )	2025-05-26 02:15:21 -07:00
Lifu Huang	0d503090aa	Supported precomputed feature for Kimi VL (#6599 )	2025-05-26 01:24:13 -07:00
fzyzcjy	93e53f6e0b	Logging and minor fixes to two batch overlap and EPLB (#6595 )	2025-05-25 22:36:40 -07:00
fzyzcjy	8c7279c24e	Fix profiling will crash the server when using num_steps (#6586 )	2025-05-25 22:36:02 -07:00
fzyzcjy	0ca1811715	Support fake perfectly balanced EP dispatch algorithm (#6571 )	2025-05-25 22:35:51 -07:00
Lifu Huang	022012aae8	Support Phi-4 Multi-Modal (text + vision only) (#6494 )	2025-05-24 21:43:38 -07:00
Xinyuan Tong	681fdc264b	Refactor vlm embedding routine to use precomputed feature (#6543 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-05-24 18:39:21 -07:00
fzyzcjy	0d47788025	Support overlapping two batches (#4068 )	2025-05-24 17:39:07 -07:00
Byron Hsu	2d831c6ef9	[PD] Support structured output (#6560 )	2025-05-23 21:49:00 -07:00
Yi Zhang	e6f113569e	support eplb for qwen3 (#6533 )	2025-05-23 18:31:30 -07:00
Byron Hsu	8233cc10fd	[PD] Support logprob & Add failure test (#6558 )	2025-05-23 14:29:20 -07:00
Byron Hsu	d2e0881a34	[PD] support spec decode (#6507 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com>	2025-05-23 12:03:05 -07:00
Chang Su	4685fbb888	[VLM] Support chunk prefill for VLM (#6355 ) Co-authored-by: yizhang2077 <1109276519@qq.com>	2025-05-22 20:32:41 -07:00
Byron Hsu	0a4fc73b48	[PD] Fix failure abort (#6535 )	2025-05-22 20:32:03 -07:00
fzyzcjy	7a80f56513	Support dynamically rebalancing experts using EPLB (#6469 )	2025-05-21 23:13:21 -07:00
fzyzcjy	9484eba4ad	Support logging expert balancedness metrics (#6482 )	2025-05-21 23:05:33 -07:00
fzyzcjy	fc992a09f9	Support updating expert locations dynamically (#6388 )	2025-05-21 21:59:33 -07:00
Byron Hsu	3bde101099	[PD] Abort request if transfer fails (#6504 )	2025-05-21 21:44:25 -07:00
Byron Hsu	7513558074	[PD] Add doc and simplify sender.send (#6019 )	2025-05-21 21:22:21 -07:00
fzyzcjy	ccfe5c009d	Support redundant experts in expert parallel (#6461 )	2025-05-21 02:05:53 -07:00
Zilin Zhu	7c347259ff	[RL] allow weight updation with dp attention enabled (#6311 )	2025-05-21 01:58:55 -07:00
fzyzcjy	e98afbe042	Support dispatching logical to physical experts (#6385 )	2025-05-19 22:13:55 -07:00
fzyzcjy	cba1cdbc46	Support DeepSeek EPLB algorithm with static distributions (#6387 )	2025-05-19 21:06:21 -07:00
fzyzcjy	c471d39eb9	Support loading weights when physical experts are different from logical experts (#6386 )	2025-05-19 21:05:53 -07:00
fzyzcjy	f0653886a5	Expert distribution recording without overhead for EPLB (#4957 )	2025-05-19 20:07:43 -07:00
Yi Zhang	b06215daed	[BUG] fix stop_profile crash (#6431 )	2025-05-19 17:30:33 -07:00
Trevor Morris	7adf245ba2	[Metrics] Add KV events publishing (#6098 )	2025-05-19 14:19:54 -07:00
Mick	626ccb7d3f	vlm: tensor hash kernel (#5974 )	2025-05-18 15:38:16 -07:00
Mick	01dd39bac1	refactor: minor refactors regarding multimodal processing (#6187 )	2025-05-17 22:53:20 -07:00
fzyzcjy	4086566516	Fix expert distribution recorder and profiler command stuck forever (#6284 )	2025-05-17 17:10:44 -07:00
fzyzcjy	fd08c04821	Support custom DeepEP tuning config (#6257 )	2025-05-17 17:09:42 -07:00
fzyzcjy	01d2838c0f	Fix stop_profile does not wait for finishing (#4741 )	2025-05-17 17:06:15 -07:00

1 2 3 4 5 ...

889 Commits