Commit Graph

5871 Commits

Author SHA1 Message Date
Keyang Ru
63e84352b7 [router] openai router: support grok model (#11511) 2025-10-12 22:44:43 -04:00
Yongtong Wu
a20e7df8d0 Improve dp attention port assignment scheme (#5889)
Co-authored-by: Cheng Wan <cwan@x.ai>
2025-10-12 17:55:59 -07:00
Cheng Wan
1bdd010291 Revert "Deprecate global_server_args_dict" (#11520) 2025-10-12 17:40:40 -07:00
Cheng Wan
6cd296940a [lint] Fix the lint issue (#11516) 2025-10-12 16:22:46 -07:00
Lianmin Zheng
2ac46e94ef Sync changes on io_struct.py and deterministic ops (#11498) 2025-10-12 16:03:10 -07:00
Binyao Jiang
0aa65f94f1 [Fix] Improve longbench prompt and other logics (#11474) 2025-10-12 15:04:28 -07:00
Yineng Zhang
0ecb42613d fix: revert temporarily remove b200 tests (#11515) 2025-10-12 15:02:37 -07:00
Yineng Zhang
05f015f65f chore: remove flashinfer cleanup cache (#11514) 2025-10-12 14:56:33 -07:00
Liangsheng Yin
1083e7e3df Deprecate global_server_args_dict (#11331) 2025-10-13 01:20:47 +08:00
Liangsheng Yin
2157d12ae8 [CI] fix lint (#11509) 2025-10-13 01:07:21 +08:00
Mick
9f2b457cbe doc: add doc for adding new models into nightly-ci (#11443)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2025-10-12 08:35:10 -07:00
hzh0425
f5b34a510c Bugfix: Fix Type consistency for KV indices in SWARadixCache (#11452)
Co-authored-by: yizhang2077 <1109276519@qq.com>
2025-10-12 23:19:44 +08:00
Lianmin Zheng
5a6ec8f999 Fix unit tests (#11503) 2025-10-12 07:45:57 -07:00
Lianmin Zheng
6a653bb11b temporarily remove b200 tests (#11502) 2025-10-12 06:48:49 -07:00
Lianmin Zheng
548a57b1f3 Fix port conflicts in CI (#11497) 2025-10-12 06:46:36 -07:00
Lianmin Zheng
88e73ed048 Temporarily remove b200 tests (#11501) 2025-10-12 06:41:37 -07:00
Yi Zhang
4b15fa00f0 move fla env check position (#11500) 2025-10-12 06:40:45 -07:00
Liangsheng Yin
f49419061d Move args from global_config to environ (#11332) 2025-10-12 21:29:31 +08:00
Liangsheng Yin
01e59e8247 Fix CI break by express-laned PRs. (#11499) 2025-10-12 21:06:06 +08:00
Mike Qiu
99a0704a36 bailingMoE: Fix Key error of deepep_mode (#11465)
Signed-off-by: Michael Qiu <qiudayu.qdy@antgroup.com>
Co-authored-by: Mike_Qiu <qiudayu.qdy@antgroup.com>
2025-10-12 20:42:59 +08:00
Antoine Roux
ec1cd90ac9 Fix the GPT function calling regex to allow dash in the name (#10577) 2025-10-12 20:34:58 +08:00
Kai-Hsun Chen
1103dc6204 [chore][2/N] Avoid using default mutable parameters (#11479)
Signed-off-by: Kai-Hsun Chen <khchen@x.ai>
2025-10-12 20:34:04 +08:00
Vincent Zhong
a220536f40 [ perf ] Replace json-> orjson in hot path (#11221)
Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>
2025-10-12 20:30:58 +08:00
Mahmoud Ashraf
7b064f04f8 [bugfix]: use correct causality condition for flashattention, flashinfer, and triton backends (#10172) 2025-10-12 20:28:16 +08:00
Kai-Hsun Chen
43190becfa [chore][1/N] Avoid using default mutable parameters (#11478)
Signed-off-by: Kai-Hsun Chen <khchen@x.ai>
2025-10-12 20:26:39 +08:00
Vincent Zhong
be740acdb0 [smol] [perf] Qwen3-VL in place op. (#11481)
Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>
2025-10-12 20:25:30 +08:00
sglang-bot
2db2cddd12 chore: bump sgl-kernel version to 0.3.16 (#11476) 2025-10-11 22:04:49 -07:00
Wenyi Xu
9b5efe3464 [Router]: Small Typo in a comment within tree.rs (#11489) 2025-10-11 21:59:48 -07:00
Yuwei An
4ac8e09df0 Piecewise CUDA Graph Support & Torch Compile Backend (#10062)
Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>
2025-10-12 11:55:57 +08:00
Liangsheng Yin
20a6c0a63d Beta spec-overlap for EAGLE (#11398)
Co-authored-by: Lianmin Zheng <15100009+merrymercy@users.noreply.github.com>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
2025-10-12 11:02:22 +08:00
Glen Liu
47c606d3dc [Feature] support regex strings as a stopping condition (#10635) 2025-10-12 10:53:15 +08:00
Sahithi Chigurupati
9fcf73069f [CI] Add nightly builds to dockerhub (#9804)
Signed-off-by: Sahithi Chigurupati <chigurupati.sahithi@gmail.com>
2025-10-11 18:27:46 -07:00
Zaili Wang
0a304870e8 fix Xeon CI (#11454) 2025-10-11 14:08:28 -07:00
PGFLMG
8fdcd98efe [7/n] decouple quantization impl from vllm dependency - gguf kernel (#11019) 2025-10-11 14:04:57 -07:00
Lorenzo Lu
b5dcfd4154 Add option to disable any_whitespace for xgrammar and llguidance backends. (#8919)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
2025-10-11 22:24:58 +08:00
ybyang
5061b8fd3e fix stop when stream (#11462)
Signed-off-by: ybyang <ybyang7@iflytek.com>
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
2025-10-11 22:06:31 +08:00
ykcombat
c8452551ce [Fix] Fix split prefill with fa3. (#11428) 2025-10-11 22:03:28 +08:00
fzyzcjy
bf3e7149be Fix enable_v2 in int8 quant (#11470) 2025-10-11 21:56:30 +08:00
ykcombat
f5754d1256 [Documentation][Configuration] Server args and documentation of PD-Multiplexing. (#11427) 2025-10-11 21:36:07 +08:00
Liangsheng Yin
739daa63e4 Adjust logits metada init for target verify (#11467) 2025-10-11 21:17:04 +08:00
fzyzcjy
d957177a22 Super tiny delete unused openai router in sgl-router (#11448) 2025-10-11 15:59:30 +08:00
fzyzcjy
21337b22b9 Reland [1/2] Optimizations and refactors about quant kernel (#10312)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-10-11 15:59:03 +08:00
Zhiyu
129d299278 Enable native ModelOpt quantization support (2/3) (#9991)
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
2025-10-11 07:48:14 +00:00
Baizhou Zhang
8b85926a6e Remove tilelang dependency in Dockerfile (#11455) 2025-10-10 23:17:53 -07:00
Binyao Jiang
451d15c44b [DPSKv3.2] Rewrite nsa tilelang act_quant kernel to triton (#11450) 2025-10-10 23:13:46 -07:00
Liu-congo
c80a96dae9 [BugFix] test_mla_fp8.py fails on Cublas 12.9 (#11360)
Signed-off-by: Liu-congo <1502632128@qq.com>
2025-10-10 21:14:24 -07:00
Stefan He
eae9a9fb9d Fix batch invariant ops (#11368) 2025-10-10 20:49:08 -07:00
wxsm
2674c1d280 fix: Change dsv32 hack temporary path to use system temp directory (#11445) 2025-10-10 19:59:41 -07:00
Lianmin Zheng
61055cb309 Reorder PD disagg CI tests (#11438) 2025-10-10 17:56:49 -07:00
Chang Su
92777135a0 [router][grpc] Consolidate parser checks for chat completions (#11439) 2025-10-10 20:44:29 -04:00