sglang

Author	SHA1	Message	Date
Lianmin Zheng	43ad05907c	[Auto Sync] Update scheduler.py, server_args.py (20251020) (#11875 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Kan Wu <wukanustc@gmail.com>	2025-10-20 17:41:19 -07:00
fzyzcjy	0917c5da8c	Support mixing cutedsl and deepgemm backend (#11807 )	2025-10-21 07:38:35 +08:00
penguin_wwy	184a4df697	Replace function call with set literal (#11867 )	2025-10-21 01:39:16 +08:00
Qiaolin Yu	f7b1d8c5ab	Fix acc len and gen throughput metrics when enabling overlap-spec (#11823 ) Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>	2025-10-21 01:34:38 +08:00
Cheng Wan	bfc3b3f786	[9/N] MoE Refactor: cleanup dispatcher interfaces (#11847 )	2025-10-20 10:11:46 -07:00
Liangsheng Yin	da5bde4d16	Tiny fix main lint (#11862 )	2025-10-20 19:57:24 +08:00
DarkSharpness	276e7b3e4e	[Feature] New structural tag support (#10691 )	2025-10-20 18:25:58 +08:00
ishandhanani	296f689242	fix(server_args): handle tokenizer init conflicts (#11776 )	2025-10-20 00:27:19 -07:00
Shane A	d383e6616e	[Model] Add Olmo 3 model support (#11396 )	2025-10-19 23:59:16 -07:00
Shangming Cai	a2ba0bc3df	Tiny clean up for PD module and doc (#11747 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2025-10-20 11:52:42 +08:00
Ziming Huang	6d2d0ce285	[PD] Improve eagle acceptance rate by transferring draft model hidden states (#10801 ) Co-authored-by: Shangming Cai <csmthu@gmail.com>	2025-10-20 11:52:18 +08:00
Yuan Luo	271d3d0d50	Support mrope triton kernel and add unit test (#11722 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: b8zhong <b8zhong@uwaterloo.ca>	2025-10-20 11:51:07 +08:00
ykcombat	c4e81e64fb	[Feature] Use current greenctx stream to communicate in PD-Multiplexing. (#11594 )	2025-10-20 10:58:20 +08:00
harrisonlimh	c726d44cc7	Recapture cuda graph after model weight update to resolve IMA error (#11780 )	2025-10-20 10:50:03 +08:00
huangtingwei	cae3956585	check master server for mooncake store (#10510 )	2025-10-20 09:37:09 +08:00
Liu-congo	be0058bc05	[BugFix] replace the input_to_float8 used in dsv2 (#11612 ) Signed-off-by: Liu-congo <1502632128@qq.com>	2025-10-19 19:34:13 -05:00
fzyzcjy	a8ba32798e	Fix triton_kernels import error on some hardwares (#11831 )	2025-10-20 08:14:47 +08:00
Johnny	252dc4e112	[NVIDIA] FA3/FA4 Fix (#11606 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-10-19 17:10:10 -07:00
Baizhou Zhang	cbb5fc2edc	[CI] Add CI test for DeepSeek V3.2 MTP (#11835 )	2025-10-19 17:00:25 -07:00
Stefan He	4fff1ec1d9	Deterministic Mode: Add 1-stage triton kernel for prefill (#11147 ) Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com> Co-authored-by: Binyao Jiang <bijiang@linkedin.com>	2025-10-20 01:47:36 +08:00
Liangsheng Yin	7a020e0f3b	[Test] Add basic matched stop for beta eagle (#11833 )	2025-10-20 01:17:00 +08:00
Liangsheng Yin	48738af7f9	[CI] always print back trace in `retry()` (#11834 )	2025-10-20 01:12:49 +08:00
Paiiii	efa473348b	[Spec Decoding] Support MTP for dsv3.2 (#11652 ) Co-authored-by: Paiiiiiiiiiiiiii <zengpai@baidu.com>	2025-10-19 23:44:22 +08:00
Liangsheng Yin	d658f0497e	[overlap-spec] fix stop condition and trimming (#11819 )	2025-10-19 22:00:20 +08:00
Liangsheng Yin	57e25de756	Revert "Fix: Dynamic RoPE Cache Expansion to Prevent Position-ID Out-of-Bounds in EAGLE + Long-Sequence Workloads" (#11827 )	2025-10-19 19:44:06 +08:00
fzyzcjy	12eb02e982	Change bf16 to fp8 for some gemms in attention for DeepSeek ckpt v2 (#11805 )	2025-10-19 16:15:13 +08:00
fzyzcjy	002d037359	Avoid generation gets hanging when user specifies multiple event loops (#5162 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-10-19 16:12:49 +08:00
fzyzcjy	ce399e154c	Make single-batch overlap compatible with NextN (#11804 )	2025-10-19 16:10:44 +08:00
fzyzcjy	ea6275dfbc	Tiny add hints when users send requests to wrong place (#11808 )	2025-10-19 16:10:20 +08:00
narutolhy	eb7318f1c2	support tokenized batch request (#11091 )	2025-10-19 07:05:02 +00:00
YAMY	80407b0493	Fix: Dynamic RoPE Cache Expansion to Prevent Position-ID Out-of-Bounds in EAGLE + Long-Sequence Workloads (#10788 )	2025-10-19 11:37:43 +08:00
Liangsheng Yin	b288f4f440	Improve `send_sone` script (#11817 )	2025-10-19 11:28:16 +08:00
tazjin	6d6ea5af0c	fix: do not wrap invalid grammar objects during constrained generation (#11328 )	2025-10-19 10:54:33 +08:00
Marin	1dacedd2db	make sure logit bias is applied during eagle spec decoding verification (#11555 )	2025-10-19 10:53:33 +08:00
ybyang	b5e14b2b78	[1/2][feature] support openai like classification api (#11618 )	2025-10-18 19:32:48 -07:00
Qiaolin Yu	ebda73dc72	Use cutlass fp4 gemm by default (#11813 )	2025-10-18 14:10:15 -07:00
b8zhong	f9a7d9b3dc	support server arg override KV cache to bf16 to avoid slow cases (#11749 )	2025-10-19 02:49:48 +08:00
Liangsheng Yin	a93f10a722	[overlap-spec] support page size > 1 (#11772 )	2025-10-19 02:09:13 +08:00
Teng Ma	585e1223f0	[HiCache] feat: add more eviction policy (#11506 )	2025-10-18 15:49:45 +00:00
fzyzcjy	a7043c6f0d	Bump torch_memory_saver to avoid installing pre-release versions (#11797 )	2025-10-18 01:20:42 -07:00
Lianmin Zheng	67e34c56d7	Fix install instructions and pyproject.tomls (#11781 )	2025-10-18 01:08:01 -07:00
Yuwei An	1d726528f7	Eager Compiler for Torch Compile (#11803 ) Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>	2025-10-18 15:18:52 +08:00
Minglei Zhu	f4488e9dd9	set default attention backend for deterministic inference (#11801 )	2025-10-18 00:01:24 -07:00
Zilin Zhu	e68a2b5b2f	[RL] use cpu group to prepare_mlp_sync_batch_raw when the server is offloaded (#10152 )	2025-10-18 14:29:35 +08:00
Zilin Zhu	31b9f19e54	[RL] support weight update with DP attention (#11669 )	2025-10-18 14:26:19 +08:00
Jimmy	f7ab955455	fix(glm45): disable reduce scatter (#11665 ) Co-authored-by: Shangming Cai <csmthu@gmail.com>	2025-10-18 12:19:20 +08:00
Chang Su	ca240eefb4	[router][grpc] Support parallel queue puts in grpc_request_manager and remove mutex for grpc_client (#11798 )	2025-10-17 20:49:43 -07:00
Cheng Wan	5b214b50b6	[Refactor] move `deep_gemm_wrapper` out of `quantization` (#11784 )	2025-10-17 18:57:54 -07:00
Minglei Zhu	13219e1e48	completely remove mixed mode deterministic test as prefix mode could cover it (#11783 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-10-17 17:46:03 -07:00
fzyzcjy	33e9bbec35	Make single-batch overlap compatible with offloading (#11614 )	2025-10-18 08:45:54 +08:00

1 2 3 4 5 ...

4073 Commits