sglang

Author	SHA1	Message	Date
fzyzcjy	31589e177e	Speed up when having padding tokens two-batch overlap (#6668 ) Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>	2025-05-28 16:00:58 -07:00
fzyzcjy	ae6a5b2950	Minor refactor two-batch overlap (#6682 )	2025-05-28 15:54:17 -07:00
fzyzcjy	4839999b76	Overlap two kernels in DeepSeek with communication (#6711 )	2025-05-28 15:53:51 -07:00
fzyzcjy	541a985f85	Fuse routed_scaling_factor in DeepSeek (#6710 )	2025-05-28 15:53:37 -07:00
Hongbo Xu	5170b010a6	[PD] Remove Unnecessary Exception Handling for FastQueue.get() (#6712 )	2025-05-28 11:18:24 -07:00
shangmingc	e9fd11c0d1	[Bugfix] Fix ChatCompletion endpoint of mini_lb when stream is set (#6703 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-05-28 21:33:36 +08:00
shangmingc	c7588d593e	[Bugfix] Fix slice operation when chunk size mismatch (#6697 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-05-28 21:15:00 +08:00
ybyang	6b231325b9	[PD Perf] replace Queue to FastQueue (#6649 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> Co-authored-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-05-28 01:37:51 -07:00
shangmingc	b1c8d4e9f3	[PD] Abort unbootstrapped prefill requests through timeout (#6685 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-05-28 00:40:54 -07:00
shangmingc	fba03b29e3	[Bugfix] Fix missing abort finish reason for PD with ChatCompletion (#6693 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-05-28 00:39:46 -07:00
Chang Su	461a730280	fix(deepseekv3): Fix DeepSeekV3Detector tool_index assignment and multi-tool call streaming support (#6655 )	2025-05-28 00:22:53 -07:00
Yuan Luo	c087ddd686	Refine pre_reorder_triton_kernel slightly to improve performance (#6627 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-05-28 00:15:23 -07:00
Chang Su	41ba767f0c	feat: Add warnings for invalid tool_choice and UTs (#6582 )	2025-05-27 16:53:19 -07:00
Chang Su	bdb962d755	fix(tool call): Fix tool_index in PythonicDetector and issues with mixed output in non-streaming (#6678 )	2025-05-27 16:18:42 -07:00
fzyzcjy	87068b5cc7	Support gathering expert distribution details (#6665 )	2025-05-27 15:32:59 -07:00
fzyzcjy	a564e001b5	Fix DeepEP error in Qwen 3 MoE models (#6673 )	2025-05-27 15:12:54 -07:00
Trevor Morris	e806f708c9	[PD] Make bootstrap code common between NIXL and Mooncake (#6473 )	2025-05-27 12:47:38 -07:00
Yineng Zhang	fa6723f08f	Revert "fix communicator for non-dp lm head (#6662 )" (#6677 )	2025-05-27 12:22:59 -07:00
fzyzcjy	673ff668f7	Speed up expert location update (#6661 )	2025-05-27 10:00:09 -07:00
fzyzcjy	447be24228	Fix OOM when updating expert locations (#6660 )	2025-05-27 09:59:53 -07:00
HAI	183d9f969c	DeepSeek: enable none block-quant FP8 quantizations (#6638 )	2025-05-27 09:06:40 -07:00
Ke Bao	631950280a	Support EAGLE draft extend CUDA graph (#6606 ) Co-authored-by: Sehoon Kim <sehoonkim@berkeley.edu>	2025-05-27 02:35:17 -07:00
Cheng Wan	a3d7f4b673	fix communicator for non-dp lm head (#6662 )	2025-05-27 02:31:12 -07:00
Yi Zhang	b18416fbf8	Fix qwen3 tbo/dp-lm-head (#6652 )	2025-05-27 00:38:27 -07:00
Mick	ce9d690ef4	fix: fix nightly test from updating transformers (#6658 )	2025-05-27 00:28:11 -07:00
Baizhou Zhang	bdaefbbfbd	Add environment flag for disabling message queue broadcaster (#6403 )	2025-05-26 22:32:41 -07:00
Chang Su	ae33584235	[Bugfix]: Fix call for function_call_parser.multi_format_detector in adapter.py (#6650 )	2025-05-26 21:57:10 -07:00
Lifu Huang	477a101cbd	Refactor LoRA handling to support adapter tensors in fused format (#6585 )	2025-05-26 21:51:54 -07:00
fzyzcjy	1a8f5f6836	Super tiny rename environment variable (#6648 )	2025-05-26 21:01:16 -07:00
fzyzcjy	32cd707002	Support TP in attention for two batch overlap (#6634 )	2025-05-26 20:28:12 -07:00
fzyzcjy	ebd1ed49d4	Tiny refactor communicator (#6646 )	2025-05-26 20:24:17 -07:00
Xinyuan Tong	d6864ce6d6	[New Model] Devstral support (#6547 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-05-26 19:27:48 -07:00
Shi Shuai	755a36614b	fix: added "\n" to qwen25 tool parser structural tags (#6631 )	2025-05-26 19:25:45 -07:00
Lifu Huang	79a39ac0cc	follow-up: move Idefics2 to a shared location to eliminate unexpected dependency. (#6603 )	2025-05-26 19:23:59 -07:00
shangmingc	3ce94f71f9	[PD] Handle P/D failure and reconnect without affecting other instances (#6263 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-05-26 19:21:01 -07:00
fzyzcjy	ca95556c76	Tiny fix sampler error when prob is not contiguous (#6639 )	2025-05-26 19:19:08 -07:00
fzyzcjy	0ca3e56802	Tiny fix missing expert location dispatch info (#6620 )	2025-05-26 08:58:31 -07:00
fzyzcjy	5c7aa00976	Fix EPLB algorithm fail to run when using 3 nodes for prefill (#6629 )	2025-05-26 08:43:24 -07:00
fzyzcjy	fe386acae6	Automatically configure for EPLB-related args (#6628 )	2025-05-26 08:42:49 -07:00
Yi Zhang	14d1075f2c	fix qwen3moe eplb prefill bug (#6617 )	2025-05-26 02:15:21 -07:00
Lifu Huang	0d503090aa	Supported precomputed feature for Kimi VL (#6599 )	2025-05-26 01:24:13 -07:00
fzyzcjy	501efc3d36	Tiny fix CI (#6611 )	2025-05-25 23:36:34 -07:00
Yi Zhang	f9bab3d591	qwen3moe support two batch overlap (#6598 )	2025-05-25 23:08:16 -07:00
Chang Su	16f69b1f65	feat: Improve Mistral and Qwen25 function call parsing (#6597 )	2025-05-25 23:07:23 -07:00
Yi Zhang	65f091310c	refactor qwen moe code, use communicator to support tp+dp (#6581 )	2025-05-25 23:01:10 -07:00
Yineng Zhang	7eb9d8e594	chore: upgrade transformers 4.52.3 (#6575 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2025-05-25 22:49:58 -07:00
fzyzcjy	6bebef60a7	Support accurate length control for bench serving (#6594 )	2025-05-25 22:46:23 -07:00
fzyzcjy	25be63d0b2	Auto handle PD disaggregation in bench_serving (#6587 ) Co-authored-by: yizhang2077 <1109276519@qq.com>	2025-05-25 22:41:27 -07:00
fzyzcjy	93e53f6e0b	Logging and minor fixes to two batch overlap and EPLB (#6595 )	2025-05-25 22:36:40 -07:00
fzyzcjy	a191a0e47c	Improve performance of two batch overlap in some imbalanced cases (#6593 )	2025-05-25 22:36:18 -07:00

1 2 3 4 5 ...

2382 Commits