Commit Graph

4887 Commits

Author SHA1 Message Date
Liangsheng Yin
9b5f0f64f5 Fix tiny misalign with previous truncation setting in tokenizer_manager (#9430) 2025-08-21 14:05:35 +08:00
Azure
70bb066ee4 Fix FP4 inference corruption issue in glm4.5-air model (#9346) 2025-08-20 22:13:47 -07:00
VDV1985
2c4b4b786b [feature] Ascend NPU graph support (#9399)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: yezhifeng (D) <y00897525@china.huawei.com>
Co-authored-by: anon189Ty <Stari_Falcon@outlook.com>
Co-authored-by: Maksim <makcum888e@mail.ru>
Co-authored-by: ssshinigami <44640852+ssshinigami@users.noreply.github.com>
2025-08-20 21:13:27 -07:00
Martin Vit
7cd2ee06d7 feat: Add Triton fallback option and SM120 MoE configs for FP8 models (#9251) 2025-08-20 19:33:15 -07:00
Liangsheng Yin
eb19ccadae [bug] fix errors related to context length in SD (#9388) 2025-08-21 10:32:34 +08:00
Shangming Cai
25ef53f05f [PD] Fix nvlink transport accuracy through transferring metadata with tcp (#9261)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2025-08-20 19:29:10 -07:00
Cao E
c674bf9c6b Fix biased_grouped_topk_cpu (#9420) 2025-08-20 19:18:48 -07:00
Qiaolin Yu
af1973b871 Fix max_seq_len_k in trtllm_mha attention backend (#9416) 2025-08-20 19:17:13 -07:00
Chang Su
5cfbb4c136 [router] add glm and step3 reasoning parser (#9415) 2025-08-20 18:33:10 -07:00
Chang Su
e65231022f [router] add tokenizer integration test with real mini tokenizer (#9413) 2025-08-20 17:56:23 -07:00
Keyang Ru
3828db4309 [router] Add IGW (Inference Gateway) Feature Flag (#9371)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-08-20 17:38:57 -07:00
strgrb
88fbc31b50 Support trtllm_allreduce_fusion in flashinfer for cuda<12.8 (#9339)
Co-authored-by: Zhang Kaihong <zhangkaihong.zkh@alibaba-inc.com>
2025-08-20 16:54:30 -07:00
nathan
8f5b9910c1 Add support for Qwen3-seq-cls (#9357) 2025-08-20 16:51:56 -07:00
Mick
ef3004d90a misc: parse bench_serving result as markdown table (#9377) 2025-08-20 16:44:20 -07:00
Xinyuan Tong
84719b527a fix: InternS1 don't recognize image, updates image token for InternVL processor (#9381)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2025-08-20 16:43:03 -07:00
jiapingW
e99729c9f3 Fixed the issue where eagle3 TPOT was not as good as without eagle3. (#9404) 2025-08-20 16:42:01 -07:00
Nicolas Castet
c10b8e6a0f Support DP attention with GPT-OSS (#9359) 2025-08-20 16:36:31 -07:00
Lifu Huang
d4bce29721 Fix incorrect logic in chat template handling. (#9336)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2025-08-20 16:25:36 -07:00
Lifu Huang
b0980af89f Support pinning adapter via server args. (#9249) 2025-08-20 16:25:01 -07:00
Nathan Wang
24eaebeb4b Fix FlashInfer GPU <-> CPU sync (#9409) 2025-08-20 15:26:12 -07:00
Trevor Morris
a91e90d9a3 [2/2] Fuse routed scaling factor into select_experts (#8690) 2025-08-20 15:10:16 -07:00
Xiaoyu Zhang
f96413c444 Refactor allreduce add rmsnorm pattern (#9278) 2025-08-20 02:03:08 -07:00
Liangsheng Yin
08ebdf79d0 Fix the --allow-auto-truncate argument in tokenizer manager. (#9391)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-20 16:56:47 +08:00
fzyzcjy
42c8704560 Add PDL support for quant kernel and rope kernel (#9106) 2025-08-20 01:56:29 -07:00
Yichen Yan
c9bf3877a0 Reduce overhead for fa by not calling heavy CUDA property check (#7375) 2025-08-20 16:26:28 +08:00
Even Zhou
de2dd73831 Revert "[feature] Rework Ascend NPU graph support" (#9385) 2025-08-20 00:35:10 -07:00
Lianmin Zheng
1ec9769753 [Docs] Update contribution guide (#9383) 2025-08-19 23:37:45 -07:00
Shangming Cai
d8ed60f254 [CI] Fix disaggregation failure tolerance CI (#9378)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2025-08-19 23:31:08 -07:00
Mingyi
f1b0eda55c [readme] Add SGLang x AMD SF meetup information (#9380) 2025-08-19 22:25:09 -07:00
Lianmin Zheng
f20b6a3f2b [minor] Sync style changes (#9376) 2025-08-19 21:35:01 -07:00
Even Zhou
3680d6f88b [feature] Rework Ascend NPU graph support (#9350)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: yezhifeng (D) <y00897525@china.huawei.com>
Co-authored-by: anon189Ty <Stari_Falcon@outlook.com>
Co-authored-by: Maksim <makcum888e@mail.ru>
Co-authored-by: ssshinigami <44640852+ssshinigami@users.noreply.github.com>
2025-08-19 20:32:27 -07:00
Keyang Ru
f515449582 Fix gpt-oss response api streaming issue (#9368) 2025-08-19 20:19:42 -07:00
Ke Bao
e0ce171d79 Fix triton backend eagle illegal memory access (#9344) 2025-08-19 20:16:26 -07:00
fzyzcjy
fe43e889f8 Fix mini lb timeout issue (#9369) 2025-08-19 20:15:16 -07:00
Keyang Ru
5ae5ecaa15 [router] Implement OpenAI Responses API specification (#9367) 2025-08-19 20:14:47 -07:00
Simo Lin
5fbad308cd [router] add tokenizer chat template support (#9370)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
2025-08-19 20:14:02 -07:00
Chang Su
7638f5e44e [router] Implement gRPC SGLangSchedulerClient (#9364) 2025-08-19 16:44:11 -07:00
Simo Lin
b45f753cba [router] adds reasoning parser pooling and thread-safe (#9360) 2025-08-19 13:35:39 -07:00
Keyang Ru
c5057262fa [Router] Add validation module for API parameters (#9335) 2025-08-19 13:25:53 -07:00
Chang Su
46fe8b8cb2 [CI] Fix lint issues (#9361) 2025-08-19 13:05:36 -07:00
Simo Lin
0b95a01a8f [router] add tiktokenizer and sequence in router (#9354)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
2025-08-19 10:46:28 -07:00
mpashkovskiy
a3b810ebdb fix: enable multi-GPU Triton fused MoE tuning (#6295) 2025-08-19 10:16:58 -07:00
Simo Lin
94959237bf [router] add dsr1, kimi, and qwen reasoning parser (#9353) 2025-08-19 10:15:24 -07:00
Even Zhou
f4fafacc5d Revert "[feature] Ascend NPU graph support (#8027)" (#9348) 2025-08-19 10:11:23 -07:00
chenxu140
01d47a27b6 [Bugfix] fix kv buffer register & dp attention & deepepmoe (#9327) 2025-08-19 10:09:48 -07:00
Lianmin Zheng
ecc9f3e47a [Minor] Fix the style of sgl-kernel (#9332) 2025-08-18 23:45:00 -07:00
Yineng Zhang
7e8187e004 docs: fix spec (#9326) 2025-08-18 19:35:46 -07:00
Enrique Shockwave
e483ab6d20 enable marlin fp8 blockwise (#8990) 2025-08-18 18:53:15 -07:00
EduardDurech
720cd308ba Add CMakeLists.txt binary_dir (#7019) 2025-08-18 18:36:33 -07:00
Keyang Ru
ce67b2d586 [router]restructure protocol modules for better organization (#9321) 2025-08-19 01:07:58 +00:00