sglang

Author	SHA1	Message	Date
Zilin Zhu	e68a2b5b2f	[RL] use cpu group to prepare_mlp_sync_batch_raw when the server is offloaded (#10152 )	2025-10-18 14:29:35 +08:00
Zilin Zhu	31b9f19e54	[RL] support weight update with DP attention (#11669 )	2025-10-18 14:26:19 +08:00
Jimmy	f7ab955455	fix(glm45): disable reduce scatter (#11665 ) Co-authored-by: Shangming Cai <csmthu@gmail.com>	2025-10-18 12:19:20 +08:00
Chang Su	ca240eefb4	[router][grpc] Support parallel queue puts in grpc_request_manager and remove mutex for grpc_client (#11798 )	2025-10-17 20:49:43 -07:00
Cheng Wan	5b214b50b6	[Refactor] move `deep_gemm_wrapper` out of `quantization` (#11784 )	2025-10-17 18:57:54 -07:00
Minglei Zhu	13219e1e48	completely remove mixed mode deterministic test as prefix mode could cover it (#11783 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-10-17 17:46:03 -07:00
fzyzcjy	33e9bbec35	Make single-batch overlap compatible with offloading (#11614 )	2025-10-18 08:45:54 +08:00
fzyzcjy	dcb8f090ad	Super tiny fix CI (#11788 )	2025-10-17 17:41:58 -07:00
Lianmin Zheng	9eefe2c0b7	Set CUDA_VISIBLE_DEVICES to achieve one GPU per process (#9170 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: Cheng Wan <cwan@x.ai> Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>	2025-10-17 17:30:06 -07:00
Zilin Zhu	69fe3c9726	Manually flip deepep_mode for cuda_graph (#11666 )	2025-10-18 08:05:48 +08:00
fzyzcjy	8af8491298	Support casting bf16 NextN moe to fp8 (#11613 )	2025-10-18 08:02:15 +08:00
fzyzcjy	505329cab0	Support shared experts overlap in cutlass moe (#11611 )	2025-10-18 07:59:40 +08:00
fzyzcjy	8a382fd399	Super tiny fix missing input throughput (#11607 )	2025-10-18 07:58:48 +08:00
Chang Su	627974405d	[Lint] Add `python/sglang` to ruff F401 checks and remove unused imports in files (#11685 )	2025-10-17 16:49:46 -07:00
Antonin Vidon	2614adf9ca	[Fix] Skip visual layers when applying LoRA to Qwen2VL modules (#11519 )	2025-10-17 17:39:57 -05:00
Lianmin Zheng	fdd7c69d65	[Auto Sync] Update common.py (20251017) (#11782 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>	2025-10-17 15:03:42 -07:00
Lianmin Zheng	b9a54e0968	[minor] sync code on python/sglang/test/test_deterministic.py and improve ci tests (#11777 ) Co-authored-by: Stefan He <hebiaobuaa@gmail.com> Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>	2025-10-17 14:25:22 -07:00
Baizhou Zhang	20b8d2306c	Cleaning indexer for DeepSeek V3.2 (#11682 )	2025-10-17 13:47:21 -07:00
Yineng Zhang	b79f75fd53	[Auto Sync] Update scheduler.py (20251017) (#11738 )	2025-10-17 12:36:07 -07:00
Chunyuan WU	8fcc69e7c4	Turn on shm_allreduce and shm_allgather for fp16 (#10725 )	2025-10-17 12:35:20 -07:00
ykcombat	f440baa136	[Feature] Reuse flashinfer workspace for PD-Multiplexing. (#11540 )	2025-10-18 02:35:06 +08:00
Yineng Zhang	da681f35d3	Revert "Set csgmv as default lora backend. (#11488 )" (#11735 )	2025-10-17 12:01:36 -05:00
pdasgup	9b0f725b1d	add tuned fuse moe kernel for qwen3 235b fp8 on h200 (#11730 ) Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>	2025-10-17 09:55:09 -07:00
Liangsheng Yin	cde5a6e30f	Abstraction for spec worker and code cleanup (#11643 )	2025-10-17 23:31:36 +08:00
Mick	3e4c7da2f5	ci: reduce and refactor vlm ut and combine test files (#11062 )	2025-10-17 15:24:50 +00:00
Liangsheng Yin	d88ac9bc9a	[overlap-spec] Make plan stream an option (#11724 )	2025-10-17 15:48:57 +08:00
Liangsheng Yin	ce11dd82dc	[CI] Try fix broken event loop init (#11746 )	2025-10-17 13:30:17 +08:00
StonyPort	fd389df96e	Reduce the image processing latency in VLM (#11541 ) Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>	2025-10-16 15:00:03 -07:00
Baizhou Zhang	b0d1d717e1	Revert "make radix cache deterministic" (#11728 )	2025-10-16 14:36:15 -07:00
Simo Lin	4f24ab1718	[router][grpc] add dissag info to warm up in grpc server (#11727 )	2025-10-16 14:19:55 -07:00
Mick	86b04d25b3	model: qwen3-omni (thinker-only) (#10911 ) Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-10-16 13:20:38 -07:00
sglang-bot	85ebeecf06	chore: bump SGLang version to 0.5.3.post3 (#11693 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2025-10-16 13:14:55 -07:00
Hank Han	0dd6cf16ba	[ci]use H20 to run disaggregation test (#11543 )	2025-10-16 11:42:42 -07:00
Even Zhou	3cceaa381a	[Bugfix] Fix Qwen3/DSV3/DSV3.2 model support (#11510 )	2025-10-16 15:14:09 +08:00
Lifu Huang	b0d20cdec7	Set csgmv as default lora backend. (#11488 )	2025-10-15 23:53:24 -05:00
YanbingJiang	cbac499750	Split test_intel_amx_attention_backend.py to pass CI of timeout (#11370 ) Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2025-10-15 19:22:32 -07:00
Shangming Cai	476c67d7fc	Fix missing a2a backend init of GLM4.5 MoE Block (#11692 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2025-10-15 19:13:08 -07:00
Shangming Cai	868403f642	[PD] Add PD support for hybrid model (Qwen3-Next, DeepSeek V3.2 Exp) (#10912 ) Signed-off-by: Shangming Cai <csmthu@gmail.com> Co-authored-by: hzh0425 <hzh0425@apache.org> Co-authored-by: ZeldaHuang <hzm414167@alibaba-inc.com>	2025-10-15 18:59:14 -07:00
Hanming Lu	97d857c096	[Mamba] Increase default mamba_full_memory_ratio to 0.9 (#11679 )	2025-10-16 09:56:43 +08:00
Lianmin Zheng	cd7e1bd591	Sync code and test CI; rename some env vars (#11686 )	2025-10-15 18:37:03 -07:00
Huaiyu, Zheng	729b7edf72	enable rmsnorm on XPU (#10248 )	2025-10-15 17:54:18 -07:00
DiweiSun	4c03dbaaef	[CI][XPU]enable sglang CI on Intel XPU (#9493 ) Co-authored-by: huaiyuzh <huaiyu.zheng@intel.com> Co-authored-by: Ma Mingfei <mingfei.ma@intel.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-10-15 17:13:19 -07:00
sglang-bot	baf277a9bf	chore: bump SGLang version to 0.5.3.post2 (#11680 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2025-10-15 16:49:14 -07:00
Chang Su	f226d3da2a	Fix missing json imports in serving_responses.py (#11681 )	2025-10-15 13:01:55 -07:00
Chang Su	30ea4c462b	[tool call] Fix prev_tool_call_arr management in base_format_detector.py (#11367 ) Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>	2025-10-15 09:51:51 -07:00
Shangming Cai	6d0364681c	Fix 1-step draft model forward (#11653 ) Signed-off-by: Shangming Cai <csmthu@gmail.com> Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>	2025-10-15 19:11:33 +08:00
Liangsheng Yin	8221f9ae8b	Tiny cleanup some eagle unused codes (#11660 )	2025-10-15 17:24:08 +08:00
Stefan He	6b143d62a2	Clean up some Qwen3-Next and deterministic code (#11585 )	2025-10-15 15:19:37 +08:00
Zheng Wengang	b2c8566920	[BugFix][Qwen3-VL]: fix cu_seqlens in qwen3-vl (#11458 )	2025-10-14 22:16:49 -07:00
Yineng Zhang	91fc5bb5a9	feat: add add_chunked_prefix_cache_attention_backend (#11636 )	2025-10-14 21:48:13 -07:00

1 2 3 4 5 ...

4030 Commits