Commit Graph

5901 Commits

Author SHA1 Message Date
Xiaoyu Zhang
8e51049f56 [CI Monitor] Ci monitor only deal with main branch in default (#11538) 2025-10-13 13:50:04 -07:00
Johnny
cb8f3d90d3 [NVIDIA] update pyproject.toml to support cu130 option (#11521) 2025-10-13 13:03:31 -07:00
Chang Su
4b694e7d5a [router][grpc] Add error handling to generate_tool_constraints (#11562) 2025-10-13 12:26:09 -07:00
Baizhou Zhang
9f1f699a7a [CI] Add Basic Test for DeepSeek V3.2 (#11308) 2025-10-13 11:41:02 -07:00
Trevor Morris
c9cff2b984 Fix DeepSeek-v3.2 default config (ValueError: not enough values to unpack (expected 4, got 3)) (#11557) 2025-10-13 11:27:40 -07:00
Scott Lee
b6fb5d7666 Add metrics for speculative decoding (acceptance rate, average acceptance length) (#11441) 2025-10-13 11:24:27 -07:00
Jonah Bernard
f4aa78801e [router] Add Rust CLI flags for queue size, timeout, and rate limit for token bucket rate limiter (#11483)
Co-authored-by: Simo Lin <linsimo.mark@gmail.com>
2025-10-13 11:08:48 -07:00
Lianmin Zheng
5e3f7e7fa9 Minor: improve sampler & remove unused fields from model_config.py (#11531) 2025-10-13 11:04:44 -07:00
Simo Lin
728af88781 [router] allow user to specify chat template path (#11549) 2025-10-13 10:47:57 -07:00
Chang Su
7b59b0b8b0 [router][grpc] Further delegate non-stream processing to processing.rs (#11553) 2025-10-13 10:36:27 -07:00
Liangsheng Yin
acc2327bbd Move deep gemm related arguments to sglang.srt.environ (#11547) 2025-10-14 00:34:35 +08:00
Liangsheng Yin
bfadb5ea5f Adjust overlap event loop (#11507) 2025-10-14 00:33:19 +08:00
ai-jz
9cc1e065f1 [router][Fix] Include grpc reflection runtime dependency (#11419)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
2025-10-13 09:32:42 -07:00
Johnny
b8c430f1ce [NVIDIA] BUMP FA3 (#11444)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
2025-10-13 09:30:57 -07:00
Mick
f35f120d70 fix: fix video input for qwen3-vl (#11442) 2025-10-13 09:30:43 -07:00
Liangsheng Yin
54a46a264d Remove tp_worker.worker (#11548) 2025-10-13 22:38:48 +08:00
Simo Lin
7c94eaeeb0 [router] allow tokenizer path to be dir (#11530) 2025-10-13 09:30:09 -04:00
Simo Lin
13d596c93e [router][ci] Add Nightly Release Workflow for SGLang Router (#11527) 2025-10-13 09:28:55 -04:00
Mohammad Miadh Angkad
c7867b6702 [Fix] Add per_channel_quant parameter to MoE config functions (#11201) 2025-10-13 21:26:06 +08:00
Liangsheng Yin
516738b096 Depreate global_server_args_dict (#11528) 2025-10-13 19:34:43 +08:00
Yuan Luo
0b6f535f66 [Reland] perf: optimize qwen-vl with symm mem allreduce (#11457)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2025-10-13 17:51:25 +08:00
Shangming Cai
c5fe3c0b75 Tiny fix test run estimated time (#11544)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2025-10-13 02:23:13 -07:00
hzh0425
318424e2c8 [HICache]: Support 3FS-Store with page_first_direct layout (#11460) 2025-10-13 15:47:22 +08:00
Xiaoyu Zhang
6806c4e63e [CI monitor] Improve CI analyzer: fix job failure tracking and add CUDA-focused filtering (#11505) 2025-10-13 13:31:09 +08:00
Mick
0c0779d667 ci: improve nightly-ci (#11385) 2025-10-12 21:19:34 -07:00
Yi Zhang
a55cf5304a [Feature] Support mamba radix cache v0 (#11214)
Co-authored-by: hanming-lu <hanming@x.ai>
Co-authored-by: hzh0425 <hzh0425@apache.org>
Co-authored-by: thalahors <ericalcaide1@gmail.com>
2025-10-12 20:57:15 -07:00
Yuanhang Sun
19ba16aa3d [Fix]: add missing device attribute to ChunkCache (#11493) 2025-10-12 20:49:59 -07:00
Qiaolin Yu
a2b3d9b90b Update DeepSeek-R1-FP4 default config on blackwell (#11512) 2025-10-12 20:32:11 -07:00
Qi Yuhang
9a30914e94 [sgl-kernel][1/N]Support Expert Specialization Grouped GEMM (#11432)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: PGFLMG <1106310035@qq.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
2025-10-12 20:19:21 -07:00
Jonah Bernard
8e776c78a1 docs(router): add token-bucket rate limiting to the docs (#11485) 2025-10-12 20:03:27 -07:00
Keyang Ru
63e84352b7 [router] openai router: support grok model (#11511) 2025-10-12 22:44:43 -04:00
Yongtong Wu
a20e7df8d0 Improve dp attention port assignment scheme (#5889)
Co-authored-by: Cheng Wan <cwan@x.ai>
2025-10-12 17:55:59 -07:00
Cheng Wan
1bdd010291 Revert "Deprecate global_server_args_dict" (#11520) 2025-10-12 17:40:40 -07:00
Cheng Wan
6cd296940a [lint] Fix the lint issue (#11516) 2025-10-12 16:22:46 -07:00
Lianmin Zheng
2ac46e94ef Sync changes on io_struct.py and deterministic ops (#11498) 2025-10-12 16:03:10 -07:00
Binyao Jiang
0aa65f94f1 [Fix] Improve longbench prompt and other logics (#11474) 2025-10-12 15:04:28 -07:00
Yineng Zhang
0ecb42613d fix: revert temporarily remove b200 tests (#11515) 2025-10-12 15:02:37 -07:00
Yineng Zhang
05f015f65f chore: remove flashinfer cleanup cache (#11514) 2025-10-12 14:56:33 -07:00
Liangsheng Yin
1083e7e3df Deprecate global_server_args_dict (#11331) 2025-10-13 01:20:47 +08:00
Liangsheng Yin
2157d12ae8 [CI] fix lint (#11509) 2025-10-13 01:07:21 +08:00
Mick
9f2b457cbe doc: add doc for adding new models into nightly-ci (#11443)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2025-10-12 08:35:10 -07:00
hzh0425
f5b34a510c Bugfix: Fix Type consistency for KV indices in SWARadixCache (#11452)
Co-authored-by: yizhang2077 <1109276519@qq.com>
2025-10-12 23:19:44 +08:00
Lianmin Zheng
5a6ec8f999 Fix unit tests (#11503) 2025-10-12 07:45:57 -07:00
Lianmin Zheng
6a653bb11b temporarily remove b200 tests (#11502) 2025-10-12 06:48:49 -07:00
Lianmin Zheng
548a57b1f3 Fix port conflicts in CI (#11497) 2025-10-12 06:46:36 -07:00
Lianmin Zheng
88e73ed048 Temporarily remove b200 tests (#11501) 2025-10-12 06:41:37 -07:00
Yi Zhang
4b15fa00f0 move fla env check position (#11500) 2025-10-12 06:40:45 -07:00
Liangsheng Yin
f49419061d Move args from global_config to environ (#11332) 2025-10-12 21:29:31 +08:00
Liangsheng Yin
01e59e8247 Fix CI break by express-laned PRs. (#11499) 2025-10-12 21:06:06 +08:00
Mike Qiu
99a0704a36 bailingMoE: Fix Key error of deepep_mode (#11465)
Signed-off-by: Michael Qiu <qiudayu.qdy@antgroup.com>
Co-authored-by: Mike_Qiu <qiudayu.qdy@antgroup.com>
2025-10-12 20:42:59 +08:00