Simo Lin
|
4b62af92ef
|
[router] change worker api to async instead of sync (#11566)
|
2025-10-14 00:32:21 -07:00 |
|
Simo Lin
|
0b9915c132
|
[router] update generate spec to align with sgl io struct (#11591)
|
2025-10-14 02:51:33 -04:00 |
|
Chang Su
|
27ef1459e6
|
[router][protocols] Add Axum validate extractor and use it for /v1/chat/completions endpoint (#11588)
|
2025-10-13 22:51:15 -07:00 |
|
Qiaolin Yu
|
e4358a4585
|
Add fused_moe_triton config: triton_3_4_0/E=256,N=256,device_name=NVIDIA_B200.json (#11587)
|
2025-10-14 13:24:43 +08:00 |
|
Lianmin Zheng
|
ba2ce28fe9
|
[Auto Sync] Update model_config.py (20251014) (#11580)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
|
2025-10-13 22:16:34 -07:00 |
|
sglang-bot
|
98923880bc
|
chore: bump sgl-kernel version to 0.3.16.post2 (#11583)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
|
2025-10-13 20:52:38 -07:00 |
|
Yineng Zhang
|
f792e3c561
|
Revert "[NVIDIA] BUMP FA3 (#11444)" (#11582)
|
2025-10-13 20:51:45 -07:00 |
|
Chenxi Li
|
28f80b1244
|
Implement LRU eviction policy for LoRA adapters (#11041)
|
2025-10-13 20:18:25 -07:00 |
|
Xiaoyu Zhang
|
88a6f9dab5
|
bench_serving support PD Disaggregation (#11542)
|
2025-10-13 19:43:26 -07:00 |
|
fzyzcjy
|
cb8ed2c09a
|
Make DeepEP combine recv do not overlap (#11535)
|
2025-10-13 18:40:42 -07:00 |
|
Trevor Morris
|
384733639a
|
[DSv32] Use torch.compile for _get_logits_head_gate (#11565)
|
2025-10-13 18:38:39 -07:00 |
|
Neelabh Sinha
|
aaf7af1b17
|
[FEATURE] Add Profile Trace Merger for Distributed Traces (#11413)
|
2025-10-14 09:20:17 +08:00 |
|
Yuwei An
|
932e263725
|
Compilation Folder Reset (#11539)
Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>
|
2025-10-14 09:19:12 +08:00 |
|
Qiaolin Yu
|
43f80884c5
|
Fix accept rate in speculative decoding metrics (#11572)
|
2025-10-13 16:35:50 -07:00 |
|
sglang-bot
|
60b0503227
|
chore: bump sgl-kernel version to 0.3.16.post1 (#11573)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
|
2025-10-13 16:26:18 -07:00 |
|
Qi Yuhang
|
dc48c4c0e3
|
[sgl-kernel][2/N]Support Expert Specialization Grouped GEMM (#11534)
|
2025-10-13 16:24:48 -07:00 |
|
Arthur Cheng
|
6dc9ca8c85
|
[router] Add BRANCH_TYPE=local support to Dockerfile.router for local builds (#11571)
|
2025-10-13 16:10:51 -07:00 |
|
Chang Su
|
887c2b4575
|
[router][grpc] Add serve_grpc to launch_server and log id for HealthCheck (#11564)
|
2025-10-13 16:07:19 -07:00 |
|
fzyzcjy
|
065ce81574
|
Tiny cleanup fp4 gemm calls (#11537)
|
2025-10-13 14:48:22 -07:00 |
|
Xiaoyu Zhang
|
8e51049f56
|
[CI Monitor] Ci monitor only deal with main branch in default (#11538)
|
2025-10-13 13:50:04 -07:00 |
|
Johnny
|
cb8f3d90d3
|
[NVIDIA] update pyproject.toml to support cu130 option (#11521)
|
2025-10-13 13:03:31 -07:00 |
|
Chang Su
|
4b694e7d5a
|
[router][grpc] Add error handling to generate_tool_constraints (#11562)
|
2025-10-13 12:26:09 -07:00 |
|
Baizhou Zhang
|
9f1f699a7a
|
[CI] Add Basic Test for DeepSeek V3.2 (#11308)
|
2025-10-13 11:41:02 -07:00 |
|
Trevor Morris
|
c9cff2b984
|
Fix DeepSeek-v3.2 default config (ValueError: not enough values to unpack (expected 4, got 3)) (#11557)
|
2025-10-13 11:27:40 -07:00 |
|
Scott Lee
|
b6fb5d7666
|
Add metrics for speculative decoding (acceptance rate, average acceptance length) (#11441)
|
2025-10-13 11:24:27 -07:00 |
|
Jonah Bernard
|
f4aa78801e
|
[router] Add Rust CLI flags for queue size, timeout, and rate limit for token bucket rate limiter (#11483)
Co-authored-by: Simo Lin <linsimo.mark@gmail.com>
|
2025-10-13 11:08:48 -07:00 |
|
Lianmin Zheng
|
5e3f7e7fa9
|
Minor: improve sampler & remove unused fields from model_config.py (#11531)
|
2025-10-13 11:04:44 -07:00 |
|
Simo Lin
|
728af88781
|
[router] allow user to specify chat template path (#11549)
|
2025-10-13 10:47:57 -07:00 |
|
Chang Su
|
7b59b0b8b0
|
[router][grpc] Further delegate non-stream processing to processing.rs (#11553)
|
2025-10-13 10:36:27 -07:00 |
|
Liangsheng Yin
|
acc2327bbd
|
Move deep gemm related arguments to sglang.srt.environ (#11547)
|
2025-10-14 00:34:35 +08:00 |
|
Liangsheng Yin
|
bfadb5ea5f
|
Adjust overlap event loop (#11507)
|
2025-10-14 00:33:19 +08:00 |
|
ai-jz
|
9cc1e065f1
|
[router][Fix] Include grpc reflection runtime dependency (#11419)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
|
2025-10-13 09:32:42 -07:00 |
|
Johnny
|
b8c430f1ce
|
[NVIDIA] BUMP FA3 (#11444)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
|
2025-10-13 09:30:57 -07:00 |
|
Mick
|
f35f120d70
|
fix: fix video input for qwen3-vl (#11442)
|
2025-10-13 09:30:43 -07:00 |
|
Liangsheng Yin
|
54a46a264d
|
Remove tp_worker.worker (#11548)
|
2025-10-13 22:38:48 +08:00 |
|
Simo Lin
|
7c94eaeeb0
|
[router] allow tokenizer path to be dir (#11530)
|
2025-10-13 09:30:09 -04:00 |
|
Simo Lin
|
13d596c93e
|
[router][ci] Add Nightly Release Workflow for SGLang Router (#11527)
|
2025-10-13 09:28:55 -04:00 |
|
Mohammad Miadh Angkad
|
c7867b6702
|
[Fix] Add per_channel_quant parameter to MoE config functions (#11201)
|
2025-10-13 21:26:06 +08:00 |
|
Liangsheng Yin
|
516738b096
|
Depreate global_server_args_dict (#11528)
|
2025-10-13 19:34:43 +08:00 |
|
Yuan Luo
|
0b6f535f66
|
[Reland] perf: optimize qwen-vl with symm mem allreduce (#11457)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-10-13 17:51:25 +08:00 |
|
Shangming Cai
|
c5fe3c0b75
|
Tiny fix test run estimated time (#11544)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2025-10-13 02:23:13 -07:00 |
|
hzh0425
|
318424e2c8
|
[HICache]: Support 3FS-Store with page_first_direct layout (#11460)
|
2025-10-13 15:47:22 +08:00 |
|
Xiaoyu Zhang
|
6806c4e63e
|
[CI monitor] Improve CI analyzer: fix job failure tracking and add CUDA-focused filtering (#11505)
|
2025-10-13 13:31:09 +08:00 |
|
Mick
|
0c0779d667
|
ci: improve nightly-ci (#11385)
|
2025-10-12 21:19:34 -07:00 |
|
Yi Zhang
|
a55cf5304a
|
[Feature] Support mamba radix cache v0 (#11214)
Co-authored-by: hanming-lu <hanming@x.ai>
Co-authored-by: hzh0425 <hzh0425@apache.org>
Co-authored-by: thalahors <ericalcaide1@gmail.com>
|
2025-10-12 20:57:15 -07:00 |
|
Yuanhang Sun
|
19ba16aa3d
|
[Fix]: add missing device attribute to ChunkCache (#11493)
|
2025-10-12 20:49:59 -07:00 |
|
Qiaolin Yu
|
a2b3d9b90b
|
Update DeepSeek-R1-FP4 default config on blackwell (#11512)
|
2025-10-12 20:32:11 -07:00 |
|
Qi Yuhang
|
9a30914e94
|
[sgl-kernel][1/N]Support Expert Specialization Grouped GEMM (#11432)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: PGFLMG <1106310035@qq.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
|
2025-10-12 20:19:21 -07:00 |
|
Jonah Bernard
|
8e776c78a1
|
docs(router): add token-bucket rate limiting to the docs (#11485)
|
2025-10-12 20:03:27 -07:00 |
|
Keyang Ru
|
63e84352b7
|
[router] openai router: support grok model (#11511)
|
2025-10-12 22:44:43 -04:00 |
|