Commit Graph

5994 Commits

Author SHA1 Message Date
Chang Su
9e87b60f37 [router][CI] Clean up deprecated fields in pr-test-pd-router.yml (#11739) 2025-10-16 19:01:00 -07:00
Keyang Ru
7780230a15 Revert "[router] fix get_models endpoint for openai router (#11687)" (#11740) 2025-10-16 18:36:53 -07:00
Chang Su
dc01313da1 [router] Add rustfmt and set group imports by default (#11732) 2025-10-16 17:33:29 -07:00
Keyang Ru
7a7f99beb7 [router] add spec.rs to enables tests under spec folder (#11734) 2025-10-16 16:07:26 -07:00
StonyPort
fd389df96e Reduce the image processing latency in VLM (#11541)
Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>
2025-10-16 15:00:03 -07:00
Baizhou Zhang
b0d1d717e1 Revert "make radix cache deterministic" (#11728) 2025-10-16 14:36:15 -07:00
Chang Su
c7962868c1 [router] Fix tool_choice normalization in ChatCompletionRequest and fix ut (#11731) 2025-10-16 14:20:13 -07:00
Simo Lin
4f24ab1718 [router][grpc] add dissag info to warm up in grpc server (#11727) 2025-10-16 14:19:55 -07:00
Simo Lin
64affab495 [router] fix p and d worker filtering and bootstrap port handling (#11729) 2025-10-16 14:19:39 -07:00
Keyang Ru
4c9bcb9d56 [Router] Refactor protocol definitions: split spec.rs into modular files (#11677)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
2025-10-16 13:44:44 -07:00
Mick
86b04d25b3 model: qwen3-omni (thinker-only) (#10911)
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2025-10-16 13:20:38 -07:00
sglang-bot
85ebeecf06 chore: bump SGLang version to 0.5.3.post3 (#11693)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
2025-10-16 13:14:55 -07:00
Hank Han
0dd6cf16ba [ci]use H20 to run disaggregation test (#11543) 2025-10-16 11:42:42 -07:00
Keyang Ru
0975ba99bc [router] fix get_models endpoint for openai router (#11687) 2025-10-16 09:00:08 -07:00
Shangming Cai
1de3924b18 [CI] Add GLM4MoE model test (#11706)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2025-10-16 16:25:58 +08:00
Even Zhou
3cceaa381a [Bugfix] Fix Qwen3/DSV3/DSV3.2 model support (#11510) 2025-10-16 15:14:09 +08:00
Lifu Huang
b0d20cdec7 Set csgmv as default lora backend. (#11488) 2025-10-15 23:53:24 -05:00
YanbingJiang
cbac499750 Split test_intel_amx_attention_backend.py to pass CI of timeout (#11370)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
2025-10-15 19:22:32 -07:00
Shangming Cai
476c67d7fc Fix missing a2a backend init of GLM4.5 MoE Block (#11692)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2025-10-15 19:13:08 -07:00
Fan Yin
3289da5b41 [sgl-kernel] support hadamard (#11663) 2025-10-15 19:00:44 -07:00
Shangming Cai
868403f642 [PD] Add PD support for hybrid model (Qwen3-Next, DeepSeek V3.2 Exp) (#10912)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: hzh0425 <hzh0425@apache.org>
Co-authored-by: ZeldaHuang <hzm414167@alibaba-inc.com>
2025-10-15 18:59:14 -07:00
Hanming Lu
97d857c096 [Mamba] Increase default mamba_full_memory_ratio to 0.9 (#11679) 2025-10-16 09:56:43 +08:00
Yineng Zhang
52a54a26b2 docs: Add Contributor Covenant Code of Conduct (#11689) 2025-10-15 18:50:26 -07:00
Lianmin Zheng
cd7e1bd591 Sync code and test CI; rename some env vars (#11686) 2025-10-15 18:37:03 -07:00
Huaiyu, Zheng
729b7edf72 enable rmsnorm on XPU (#10248) 2025-10-15 17:54:18 -07:00
DiweiSun
4c03dbaaef [CI][XPU]enable sglang CI on Intel XPU (#9493)
Co-authored-by: huaiyuzh <huaiyu.zheng@intel.com>
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-10-15 17:13:19 -07:00
sglang-bot
baf277a9bf chore: bump SGLang version to 0.5.3.post2 (#11680)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
2025-10-15 16:49:14 -07:00
Simo Lin
f5d30dae89 [router] Refactor StopSequenceDecoder to Use Sequence for Incremental Decoding (#11676) 2025-10-15 16:31:03 -07:00
Chang Su
2479b89405 [router][grpc] Simplify model_id determination (#11684) 2025-10-15 15:56:58 -07:00
Fan Yin
5464457251 [sgl-kernel] Optimize gguf test (#11667) 2025-10-15 15:45:53 -07:00
Qi Yuhang
6c01844f45 [sgl-kernel][3/N]Support Expert Specialization Grouped GEMM (#11674) 2025-10-15 13:39:31 -07:00
Chang Su
f226d3da2a Fix missing json imports in serving_responses.py (#11681) 2025-10-15 13:01:55 -07:00
Keyang Ru
d2478cd4ff [router] Fix response api related spec (#11621) 2025-10-15 09:59:38 -07:00
Chang Su
30ea4c462b [tool call] Fix prev_tool_call_arr management in base_format_detector.py (#11367)
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
2025-10-15 09:51:51 -07:00
Shangming Cai
6d0364681c Fix 1-step draft model forward (#11653)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
2025-10-15 19:11:33 +08:00
Liangsheng Yin
8221f9ae8b Tiny cleanup some eagle unused codes (#11660) 2025-10-15 17:24:08 +08:00
Yineng Zhang
ab9187a20b docs: update sglang installation guide (#11659) 2025-10-15 00:35:48 -07:00
Stefan He
6b143d62a2 Clean up some Qwen3-Next and deterministic code (#11585) 2025-10-15 15:19:37 +08:00
b8zhong
6bc503af73 [Doc] Update support matrix for attn and hybrid attn (#11293) 2025-10-14 22:43:11 -07:00
Zheng Wengang
b2c8566920 [BugFix][Qwen3-VL]: fix cu_seqlens in qwen3-vl (#11458) 2025-10-14 22:16:49 -07:00
fzyzcjy
32803fb279 Super tiny improve FA3 import error message (#11590) 2025-10-14 22:06:31 -07:00
Yineng Zhang
91fc5bb5a9 feat: add add_chunked_prefix_cache_attention_backend (#11636) 2025-10-14 21:48:13 -07:00
Lifu Huang
780fbf2f38 [Fix] Fix accuracy bug in CSGMV kernel caching key. (#11579) 2025-10-14 20:25:56 -07:00
Jinwu
825432fce6 [1/N]Support DeepSeek-R1 w4a8 normal deepep (#8247)
Co-authored-by: Hank Han <hanhan7630@outlook.com>
2025-10-14 20:10:53 -07:00
Xun Sun
a40229f6f8 [1/N] Introduce Mooncake Backend and Mooncake EP to Support Elastic EP (#10423)
Co-authored-by: Hank Han <hanhan7630@outlook.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
2025-10-14 19:40:54 -07:00
Simo Lin
74737b2863 [router] upgrade to 0.2.0 (#11642) 2025-10-14 22:10:30 -04:00
Simo Lin
40e0082d8d [router] add worker self discovery for metadata (#11638) 2025-10-14 22:07:25 -04:00
Sahithi Chigurupati
e9e120ac7a fix: upgrade transformers to 4.57.1 (#11628)
Signed-off-by: Sahithi Chigurupati <chigurupati.sahithi@gmail.com>
Co-authored-by: zhyncs <me@zhyncs.com>
2025-10-14 18:35:05 -07:00
Simo Lin
e0c2af2ac2 [router] update router doc to latest features (#11639) 2025-10-14 18:32:30 -07:00
cctry
1d7f783501 Refactor kv cache free (#11351) 2025-10-14 17:45:19 -07:00