sglang

Author	SHA1	Message	Date
Liangsheng Yin	cde5a6e30f	Abstraction for spec worker and code cleanup (#11643 )	2025-10-17 23:31:36 +08:00
Mick	3e4c7da2f5	ci: reduce and refactor vlm ut and combine test files (#11062 )	2025-10-17 15:24:50 +00:00
Liangsheng Yin	d88ac9bc9a	[overlap-spec] Make plan stream an option (#11724 )	2025-10-17 15:48:57 +08:00
Liangsheng Yin	ce11dd82dc	[CI] Try fix broken event loop init (#11746 )	2025-10-17 13:30:17 +08:00
StonyPort	fd389df96e	Reduce the image processing latency in VLM (#11541 ) Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>	2025-10-16 15:00:03 -07:00
Baizhou Zhang	b0d1d717e1	Revert "make radix cache deterministic" (#11728 )	2025-10-16 14:36:15 -07:00
Simo Lin	4f24ab1718	[router][grpc] add dissag info to warm up in grpc server (#11727 )	2025-10-16 14:19:55 -07:00
Mick	86b04d25b3	model: qwen3-omni (thinker-only) (#10911 ) Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-10-16 13:20:38 -07:00
sglang-bot	85ebeecf06	chore: bump SGLang version to 0.5.3.post3 (#11693 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2025-10-16 13:14:55 -07:00
Hank Han	0dd6cf16ba	[ci]use H20 to run disaggregation test (#11543 )	2025-10-16 11:42:42 -07:00
Even Zhou	3cceaa381a	[Bugfix] Fix Qwen3/DSV3/DSV3.2 model support (#11510 )	2025-10-16 15:14:09 +08:00
Lifu Huang	b0d20cdec7	Set csgmv as default lora backend. (#11488 )	2025-10-15 23:53:24 -05:00
YanbingJiang	cbac499750	Split test_intel_amx_attention_backend.py to pass CI of timeout (#11370 ) Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2025-10-15 19:22:32 -07:00
Shangming Cai	476c67d7fc	Fix missing a2a backend init of GLM4.5 MoE Block (#11692 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2025-10-15 19:13:08 -07:00
Shangming Cai	868403f642	[PD] Add PD support for hybrid model (Qwen3-Next, DeepSeek V3.2 Exp) (#10912 ) Signed-off-by: Shangming Cai <csmthu@gmail.com> Co-authored-by: hzh0425 <hzh0425@apache.org> Co-authored-by: ZeldaHuang <hzm414167@alibaba-inc.com>	2025-10-15 18:59:14 -07:00
Hanming Lu	97d857c096	[Mamba] Increase default mamba_full_memory_ratio to 0.9 (#11679 )	2025-10-16 09:56:43 +08:00
Lianmin Zheng	cd7e1bd591	Sync code and test CI; rename some env vars (#11686 )	2025-10-15 18:37:03 -07:00
Huaiyu, Zheng	729b7edf72	enable rmsnorm on XPU (#10248 )	2025-10-15 17:54:18 -07:00
DiweiSun	4c03dbaaef	[CI][XPU]enable sglang CI on Intel XPU (#9493 ) Co-authored-by: huaiyuzh <huaiyu.zheng@intel.com> Co-authored-by: Ma Mingfei <mingfei.ma@intel.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-10-15 17:13:19 -07:00
sglang-bot	baf277a9bf	chore: bump SGLang version to 0.5.3.post2 (#11680 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2025-10-15 16:49:14 -07:00
Chang Su	f226d3da2a	Fix missing json imports in serving_responses.py (#11681 )	2025-10-15 13:01:55 -07:00
Chang Su	30ea4c462b	[tool call] Fix prev_tool_call_arr management in base_format_detector.py (#11367 ) Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>	2025-10-15 09:51:51 -07:00
Shangming Cai	6d0364681c	Fix 1-step draft model forward (#11653 ) Signed-off-by: Shangming Cai <csmthu@gmail.com> Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>	2025-10-15 19:11:33 +08:00
Liangsheng Yin	8221f9ae8b	Tiny cleanup some eagle unused codes (#11660 )	2025-10-15 17:24:08 +08:00
Stefan He	6b143d62a2	Clean up some Qwen3-Next and deterministic code (#11585 )	2025-10-15 15:19:37 +08:00
Zheng Wengang	b2c8566920	[BugFix][Qwen3-VL]: fix cu_seqlens in qwen3-vl (#11458 )	2025-10-14 22:16:49 -07:00
Yineng Zhang	91fc5bb5a9	feat: add add_chunked_prefix_cache_attention_backend (#11636 )	2025-10-14 21:48:13 -07:00
Lifu Huang	780fbf2f38	[Fix] Fix accuracy bug in CSGMV kernel caching key. (#11579 )	2025-10-14 20:25:56 -07:00
Jinwu	825432fce6	[1/N]Support DeepSeek-R1 w4a8 normal deepep (#8247 ) Co-authored-by: Hank Han <hanhan7630@outlook.com>	2025-10-14 20:10:53 -07:00
Xun Sun	a40229f6f8	[1/N] Introduce Mooncake Backend and Mooncake EP to Support Elastic EP (#10423 ) Co-authored-by: Hank Han <hanhan7630@outlook.com> Co-authored-by: Shangming Cai <csmthu@gmail.com>	2025-10-14 19:40:54 -07:00
Sahithi Chigurupati	e9e120ac7a	fix: upgrade transformers to 4.57.1 (#11628 ) Signed-off-by: Sahithi Chigurupati <chigurupati.sahithi@gmail.com> Co-authored-by: zhyncs <me@zhyncs.com>	2025-10-14 18:35:05 -07:00
cctry	1d7f783501	Refactor kv cache free (#11351 )	2025-10-14 17:45:19 -07:00
Simo Lin	325951460f	[router][grpc] add warm up to grpc server (#11627 ) Co-authored-by: Chang Su <chang.s.su@oracle.com>	2025-10-14 16:11:16 -07:00
DarkSharpness	e28c9e526f	[Minor] Update xgrammar dependency (#11622 )	2025-10-14 13:46:50 -07:00
Lianmin Zheng	b98cf39866	[Auto Sync] Update collector.py (20251014) (#11625 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>	2025-10-14 13:34:33 -07:00
Lianmin Zheng	27d710457c	[Auto Sync] Update scheduler.py, server_args.py (20251014) (#11623 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Stefan He <hebiaobuaa@gmail.com>	2025-10-14 13:20:03 -07:00
Baizhou Zhang	c224a4c6cc	Fix log for chunked prefix cache (#11624 )	2025-10-14 11:49:33 -07:00
strgrb	94d26d850d	use non_blocking h2d in ForwardBatch.prepare_mlp_sync_batch. (#11605 )	2025-10-14 11:30:59 -07:00
Liangsheng Yin	5ea96ac7cc	Reduce one step decode for draft model. (#11561 )	2025-10-14 23:52:04 +08:00
yinghui	56222658ec	move eagle draft post process to cuda graph (#11434 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2025-10-14 22:50:53 +08:00
Alex Chi Z	dc965db0e0	make radix cache deterministic (#10721 ) Signed-off-by: Alex Chi Z <iskyzh@gmail.com>	2025-10-14 21:01:52 +08:00
Scott Lee	817e46f412	Refactor spec decoding metrics calculation into separate `TokenizerManager` utility function (#11586 )	2025-10-14 20:45:49 +08:00
Liangsheng Yin	5a33c3aae7	Optimize Triton Draft Backend (#11556 )	2025-10-14 20:08:32 +08:00
Qiaolin Yu	e4358a4585	Add fused_moe_triton config: triton_3_4_0/E=256,N=256,device_name=NVIDIA_B200.json (#11587 )	2025-10-14 13:24:43 +08:00
Lianmin Zheng	ba2ce28fe9	[Auto Sync] Update model_config.py (20251014) (#11580 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>	2025-10-13 22:16:34 -07:00
Chenxi Li	28f80b1244	Implement LRU eviction policy for LoRA adapters (#11041 )	2025-10-13 20:18:25 -07:00
Xiaoyu Zhang	88a6f9dab5	bench_serving support PD Disaggregation (#11542 )	2025-10-13 19:43:26 -07:00
fzyzcjy	cb8ed2c09a	Make DeepEP combine recv do not overlap (#11535 )	2025-10-13 18:40:42 -07:00
Trevor Morris	384733639a	[DSv32] Use torch.compile for _get_logits_head_gate (#11565 )	2025-10-13 18:38:39 -07:00
Neelabh Sinha	aaf7af1b17	[FEATURE] Add Profile Trace Merger for Distributed Traces (#11413 )	2025-10-14 09:20:17 +08:00

1 2 3 4 5 ...

4007 Commits