sglang

Author	SHA1	Message	Date
Simo Lin	4b62af92ef	[router] change worker api to async instead of sync (#11566 )	2025-10-14 00:32:21 -07:00
Simo Lin	0b9915c132	[router] update generate spec to align with sgl io struct (#11591 )	2025-10-14 02:51:33 -04:00
Chang Su	27ef1459e6	[router][protocols] Add Axum validate extractor and use it for `/v1/chat/completions` endpoint (#11588 )	2025-10-13 22:51:15 -07:00
Qiaolin Yu	e4358a4585	Add fused_moe_triton config: triton_3_4_0/E=256,N=256,device_name=NVIDIA_B200.json (#11587 )	2025-10-14 13:24:43 +08:00
Lianmin Zheng	ba2ce28fe9	[Auto Sync] Update model_config.py (20251014) (#11580 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>	2025-10-13 22:16:34 -07:00
sglang-bot	98923880bc	chore: bump sgl-kernel version to 0.3.16.post2 (#11583 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2025-10-13 20:52:38 -07:00
Yineng Zhang	f792e3c561	Revert "[NVIDIA] BUMP FA3 (#11444 )" (#11582 )	2025-10-13 20:51:45 -07:00
Chenxi Li	28f80b1244	Implement LRU eviction policy for LoRA adapters (#11041 )	2025-10-13 20:18:25 -07:00
Xiaoyu Zhang	88a6f9dab5	bench_serving support PD Disaggregation (#11542 )	2025-10-13 19:43:26 -07:00
fzyzcjy	cb8ed2c09a	Make DeepEP combine recv do not overlap (#11535 )	2025-10-13 18:40:42 -07:00
Trevor Morris	384733639a	[DSv32] Use torch.compile for _get_logits_head_gate (#11565 )	2025-10-13 18:38:39 -07:00
Neelabh Sinha	aaf7af1b17	[FEATURE] Add Profile Trace Merger for Distributed Traces (#11413 )	2025-10-14 09:20:17 +08:00
Yuwei An	932e263725	Compilation Folder Reset (#11539 ) Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>	2025-10-14 09:19:12 +08:00
Qiaolin Yu	43f80884c5	Fix accept rate in speculative decoding metrics (#11572 )	2025-10-13 16:35:50 -07:00
sglang-bot	60b0503227	chore: bump sgl-kernel version to 0.3.16.post1 (#11573 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2025-10-13 16:26:18 -07:00
Qi Yuhang	dc48c4c0e3	[sgl-kernel][2/N]Support Expert Specialization Grouped GEMM (#11534 )	2025-10-13 16:24:48 -07:00
Arthur Cheng	6dc9ca8c85	[router] Add BRANCH_TYPE=local support to Dockerfile.router for local builds (#11571 )	2025-10-13 16:10:51 -07:00
Chang Su	887c2b4575	[router][grpc] Add `serve_grpc` to `launch_server` and log id for HealthCheck (#11564 )	2025-10-13 16:07:19 -07:00
fzyzcjy	065ce81574	Tiny cleanup fp4 gemm calls (#11537 )	2025-10-13 14:48:22 -07:00
Xiaoyu Zhang	8e51049f56	[CI Monitor] Ci monitor only deal with main branch in default (#11538 )	2025-10-13 13:50:04 -07:00
Johnny	cb8f3d90d3	[NVIDIA] update pyproject.toml to support cu130 option (#11521 )	2025-10-13 13:03:31 -07:00
Chang Su	4b694e7d5a	[router][grpc] Add error handling to `generate_tool_constraints` (#11562 )	2025-10-13 12:26:09 -07:00
Baizhou Zhang	9f1f699a7a	[CI] Add Basic Test for DeepSeek V3.2 (#11308 )	2025-10-13 11:41:02 -07:00
Trevor Morris	c9cff2b984	Fix DeepSeek-v3.2 default config (ValueError: not enough values to unpack (expected 4, got 3)) (#11557 )	2025-10-13 11:27:40 -07:00
Scott Lee	b6fb5d7666	Add metrics for speculative decoding (acceptance rate, average acceptance length) (#11441 )	2025-10-13 11:24:27 -07:00
Jonah Bernard	f4aa78801e	[router] Add Rust CLI flags for queue size, timeout, and rate limit for token bucket rate limiter (#11483 ) Co-authored-by: Simo Lin <linsimo.mark@gmail.com>	2025-10-13 11:08:48 -07:00
Lianmin Zheng	5e3f7e7fa9	Minor: improve sampler & remove unused fields from model_config.py (#11531 )	2025-10-13 11:04:44 -07:00
Simo Lin	728af88781	[router] allow user to specify chat template path (#11549 )	2025-10-13 10:47:57 -07:00
Chang Su	7b59b0b8b0	[router][grpc] Further delegate non-stream processing to `processing.rs` (#11553 )	2025-10-13 10:36:27 -07:00
Liangsheng Yin	acc2327bbd	Move deep gemm related arguments to `sglang.srt.environ` (#11547 )	2025-10-14 00:34:35 +08:00
Liangsheng Yin	bfadb5ea5f	Adjust overlap event loop (#11507 )	2025-10-14 00:33:19 +08:00
ai-jz	9cc1e065f1	[router][Fix] Include grpc reflection runtime dependency (#11419 ) Co-authored-by: Chang Su <chang.s.su@oracle.com>	2025-10-13 09:32:42 -07:00
Johnny	b8c430f1ce	[NVIDIA] BUMP FA3 (#11444 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>	2025-10-13 09:30:57 -07:00
Mick	f35f120d70	fix: fix video input for qwen3-vl (#11442 )	2025-10-13 09:30:43 -07:00
Liangsheng Yin	54a46a264d	Remove `tp_worker.worker` (#11548 )	2025-10-13 22:38:48 +08:00
Simo Lin	7c94eaeeb0	[router] allow tokenizer path to be dir (#11530 )	2025-10-13 09:30:09 -04:00
Simo Lin	13d596c93e	[router][ci] Add Nightly Release Workflow for SGLang Router (#11527 )	2025-10-13 09:28:55 -04:00
Mohammad Miadh Angkad	c7867b6702	[Fix] Add per_channel_quant parameter to MoE config functions (#11201 )	2025-10-13 21:26:06 +08:00
Liangsheng Yin	516738b096	Depreate `global_server_args_dict` (#11528 )	2025-10-13 19:34:43 +08:00
Yuan Luo	0b6f535f66	[Reland] perf: optimize qwen-vl with symm mem allreduce (#11457 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-10-13 17:51:25 +08:00
Shangming Cai	c5fe3c0b75	Tiny fix test run estimated time (#11544 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2025-10-13 02:23:13 -07:00
hzh0425	318424e2c8	[HICache]: Support 3FS-Store with page_first_direct layout (#11460 )	2025-10-13 15:47:22 +08:00
Xiaoyu Zhang	6806c4e63e	[CI monitor] Improve CI analyzer: fix job failure tracking and add CUDA-focused filtering (#11505 )	2025-10-13 13:31:09 +08:00
Mick	0c0779d667	ci: improve nightly-ci (#11385 )	2025-10-12 21:19:34 -07:00
Yi Zhang	a55cf5304a	[Feature] Support mamba radix cache v0 (#11214 ) Co-authored-by: hanming-lu <hanming@x.ai> Co-authored-by: hzh0425 <hzh0425@apache.org> Co-authored-by: thalahors <ericalcaide1@gmail.com>	2025-10-12 20:57:15 -07:00
Yuanhang Sun	19ba16aa3d	[Fix]: add missing device attribute to ChunkCache (#11493 )	2025-10-12 20:49:59 -07:00
Qiaolin Yu	a2b3d9b90b	Update DeepSeek-R1-FP4 default config on blackwell (#11512 )	2025-10-12 20:32:11 -07:00
Qi Yuhang	9a30914e94	[sgl-kernel][1/N]Support Expert Specialization Grouped GEMM (#11432 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: PGFLMG <1106310035@qq.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2025-10-12 20:19:21 -07:00
Jonah Bernard	8e776c78a1	docs(router): add token-bucket rate limiting to the docs (#11485 )	2025-10-12 20:03:27 -07:00
Keyang Ru	63e84352b7	[router] openai router: support grok model (#11511 )	2025-10-12 22:44:43 -04:00

1 2 3 4 5 ...

5920 Commits