sglang

Author	SHA1	Message	Date
ykcombat	c4e81e64fb	[Feature] Use current greenctx stream to communicate in PD-Multiplexing. (#11594 )	2025-10-20 10:58:20 +08:00
harrisonlimh	c726d44cc7	Recapture cuda graph after model weight update to resolve IMA error (#11780 )	2025-10-20 10:50:03 +08:00
sglang-bot	283c8ba031	chore: bump sgl-kernel version to 0.3.16.post3 (#11733 )	2025-10-19 21:44:15 -05:00
huangtingwei	cae3956585	check master server for mooncake store (#10510 )	2025-10-20 09:37:09 +08:00
Kangyan-Zhou	27a223aba4	Improve Kernel Build Time (#11508 )	2025-10-19 18:11:48 -07:00
Kangyan-Zhou	53529f46cc	Fix version bump script to handle TOML files with outdated versions (#11787 ) Co-authored-by: Claude <noreply@anthropic.com>	2025-10-19 18:10:26 -07:00
Xiaoyu Zhang	24ed3f32c0	fix(ci): Fix CI Monitor limit parameter and add CI Analysis to summary (#11832 )	2025-10-19 18:08:34 -07:00
Baizhou Zhang	44f0ece9fc	[Doc] Update documents for FA4 (#11778 )	2025-10-19 17:40:38 -07:00
Liu-congo	be0058bc05	[BugFix] replace the input_to_float8 used in dsv2 (#11612 ) Signed-off-by: Liu-congo <1502632128@qq.com>	2025-10-19 19:34:13 -05:00
fzyzcjy	9e3be1fa2a	Tiny bump DeepEP version in ARM blackwell (#11810 )	2025-10-20 08:15:14 +08:00
fzyzcjy	a8ba32798e	Fix triton_kernels import error on some hardwares (#11831 )	2025-10-20 08:14:47 +08:00
hlu1	3b80232d06	[DeepseekV32] Add fast_topk_transform_ragged_fused kernel (#11815 ) Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>	2025-10-19 17:13:39 -07:00
Johnny	252dc4e112	[NVIDIA] FA3/FA4 Fix (#11606 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-10-19 17:10:10 -07:00
Baizhou Zhang	cbb5fc2edc	[CI] Add CI test for DeepSeek V3.2 MTP (#11835 )	2025-10-19 17:00:25 -07:00
Night	53fb229f53	[logprobs] Enable local deterministic logrprobs testing with strict threshold (#10994 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-19 13:30:39 -07:00
Stefan He	4fff1ec1d9	Deterministic Mode: Add 1-stage triton kernel for prefill (#11147 ) Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com> Co-authored-by: Binyao Jiang <bijiang@linkedin.com>	2025-10-20 01:47:36 +08:00
Liangsheng Yin	7a020e0f3b	[Test] Add basic matched stop for beta eagle (#11833 )	2025-10-20 01:17:00 +08:00
Liangsheng Yin	48738af7f9	[CI] always print back trace in `retry()` (#11834 )	2025-10-20 01:12:49 +08:00
Paiiii	efa473348b	[Spec Decoding] Support MTP for dsv3.2 (#11652 ) Co-authored-by: Paiiiiiiiiiiiiii <zengpai@baidu.com>	2025-10-19 23:44:22 +08:00
Liangsheng Yin	d658f0497e	[overlap-spec] fix stop condition and trimming (#11819 )	2025-10-19 22:00:20 +08:00
Liangsheng Yin	57e25de756	Revert "Fix: Dynamic RoPE Cache Expansion to Prevent Position-ID Out-of-Bounds in EAGLE + Long-Sequence Workloads" (#11827 )	2025-10-19 19:44:06 +08:00
fzyzcjy	12eb02e982	Change bf16 to fp8 for some gemms in attention for DeepSeek ckpt v2 (#11805 )	2025-10-19 16:15:13 +08:00
fzyzcjy	002d037359	Avoid generation gets hanging when user specifies multiple event loops (#5162 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-10-19 16:12:49 +08:00
fzyzcjy	a27825ae01	Support not officially supported high sgl-kernel version with low srt version (#11786 )	2025-10-19 16:11:59 +08:00
fzyzcjy	ce399e154c	Make single-batch overlap compatible with NextN (#11804 )	2025-10-19 16:10:44 +08:00
fzyzcjy	ea6275dfbc	Tiny add hints when users send requests to wrong place (#11808 )	2025-10-19 16:10:20 +08:00
narutolhy	eb7318f1c2	support tokenized batch request (#11091 )	2025-10-19 07:05:02 +00:00
Lianmin Zheng	6058fb520c	Update CODEOWNERS for layer quantization path (#11818 )	2025-10-18 21:17:17 -07:00
YAMY	80407b0493	Fix: Dynamic RoPE Cache Expansion to Prevent Position-ID Out-of-Bounds in EAGLE + Long-Sequence Workloads (#10788 )	2025-10-19 11:37:43 +08:00
Liangsheng Yin	b288f4f440	Improve `send_sone` script (#11817 )	2025-10-19 11:28:16 +08:00
tazjin	6d6ea5af0c	fix: do not wrap invalid grammar objects during constrained generation (#11328 )	2025-10-19 10:54:33 +08:00
Marin	1dacedd2db	make sure logit bias is applied during eagle spec decoding verification (#11555 )	2025-10-19 10:53:33 +08:00
ybyang	b5e14b2b78	[1/2][feature] support openai like classification api (#11618 )	2025-10-18 19:32:48 -07:00
ybyang	d513ee93ef	[2/2] [feature] support openai like classification api in router (#11670 )	2025-10-18 19:31:08 -07:00
Simo Lin	a7ae61ed77	[router] Add Configurable L0 and L1 Tokenizer Caching (#11688 )	2025-10-18 18:33:53 -07:00
kyleliang-nv	fda0cb2a30	Fix Dockerfile not installing correct version of DeepEP for arm build (#11773 )	2025-10-18 15:06:05 -07:00
Qiaolin Yu	ebda73dc72	Use cutlass fp4 gemm by default (#11813 )	2025-10-18 14:10:15 -07:00
b8zhong	f4f8a1b4d8	ci: update `lmms-eval` to speed up multimodal CI (#11000 )	2025-10-19 02:51:19 +08:00
Kindyaa	c44e985dc2	feat(example/fastapi): support --startup-timeout using Qwen3-Next-80B-A3B-Instruct as example (#11710 ) Co-authored-by: chenan01 <chenan01@cheche-MacBook-Pro.local> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-19 02:50:34 +08:00
b8zhong	f9a7d9b3dc	support server arg override KV cache to bf16 to avoid slow cases (#11749 )	2025-10-19 02:49:48 +08:00
Liangsheng Yin	a93f10a722	[overlap-spec] support page size > 1 (#11772 )	2025-10-19 02:09:13 +08:00
Teng Ma	585e1223f0	[HiCache] feat: add more eviction policy (#11506 )	2025-10-18 15:49:45 +00:00
fzyzcjy	a7043c6f0d	Bump torch_memory_saver to avoid installing pre-release versions (#11797 )	2025-10-18 01:20:42 -07:00
Lianmin Zheng	67e34c56d7	Fix install instructions and pyproject.tomls (#11781 )	2025-10-18 01:08:01 -07:00
Yuwei An	1d726528f7	Eager Compiler for Torch Compile (#11803 ) Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>	2025-10-18 15:18:52 +08:00
Minglei Zhu	f4488e9dd9	set default attention backend for deterministic inference (#11801 )	2025-10-18 00:01:24 -07:00
Zilin Zhu	e68a2b5b2f	[RL] use cpu group to prepare_mlp_sync_batch_raw when the server is offloaded (#10152 )	2025-10-18 14:29:35 +08:00
Zilin Zhu	31b9f19e54	[RL] support weight update with DP attention (#11669 )	2025-10-18 14:26:19 +08:00
Qiaolin Yu	547003bdd0	fix command line usage of profiling (#11793 )	2025-10-18 12:54:36 +08:00
Jimmy	f7ab955455	fix(glm45): disable reduce scatter (#11665 ) Co-authored-by: Shangming Cai <csmthu@gmail.com>	2025-10-18 12:19:20 +08:00

1 2 3 4 5 ...

6074 Commits