Lianmin Zheng
|
9eefe2c0b7
|
Set CUDA_VISIBLE_DEVICES to achieve one GPU per process (#9170)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: Cheng Wan <cwan@x.ai>
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
|
2025-10-17 17:30:06 -07:00 |
|
Zilin Zhu
|
69fe3c9726
|
Manually flip deepep_mode for cuda_graph (#11666)
|
2025-10-18 08:05:48 +08:00 |
|
fzyzcjy
|
8af8491298
|
Support casting bf16 NextN moe to fp8 (#11613)
|
2025-10-18 08:02:15 +08:00 |
|
fzyzcjy
|
505329cab0
|
Support shared experts overlap in cutlass moe (#11611)
|
2025-10-18 07:59:40 +08:00 |
|
fzyzcjy
|
8a382fd399
|
Super tiny fix missing input throughput (#11607)
|
2025-10-18 07:58:48 +08:00 |
|
Chang Su
|
627974405d
|
[Lint] Add python/sglang to ruff F401 checks and remove unused imports in files (#11685)
|
2025-10-17 16:49:46 -07:00 |
|
Antonin Vidon
|
2614adf9ca
|
[Fix] Skip visual layers when applying LoRA to Qwen2VL modules (#11519)
|
2025-10-17 17:39:57 -05:00 |
|
Lianmin Zheng
|
fdd7c69d65
|
[Auto Sync] Update common.py (20251017) (#11782)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
|
2025-10-17 15:03:42 -07:00 |
|
Lianmin Zheng
|
b9a54e0968
|
[minor] sync code on python/sglang/test/test_deterministic.py and improve ci tests (#11777)
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
|
2025-10-17 14:25:22 -07:00 |
|
Baizhou Zhang
|
20b8d2306c
|
Cleaning indexer for DeepSeek V3.2 (#11682)
|
2025-10-17 13:47:21 -07:00 |
|
Chang Su
|
d1984e218c
|
[router][grpc] Remove timeout for connections and remove max_tokens deprecation warning log (#11775)
|
2025-10-17 12:36:36 -07:00 |
|
Yineng Zhang
|
b79f75fd53
|
[Auto Sync] Update scheduler.py (20251017) (#11738)
|
2025-10-17 12:36:07 -07:00 |
|
Chunyuan WU
|
8fcc69e7c4
|
Turn on shm_allreduce and shm_allgather for fp16 (#10725)
|
2025-10-17 12:35:20 -07:00 |
|
ykcombat
|
f440baa136
|
[Feature] Reuse flashinfer workspace for PD-Multiplexing. (#11540)
|
2025-10-18 02:35:06 +08:00 |
|
Keyang Ru
|
2bc3fcd420
|
[doc] update router document (#11767)
|
2025-10-17 10:26:54 -07:00 |
|
Simo Lin
|
a5978a20f0
|
[router] fix grpc client time out to 1h (#11768)
|
2025-10-17 10:26:12 -07:00 |
|
Simo Lin
|
e483c1eae5
|
[router] Fix UTF-8 Boundary Panic in Stop Sequence Decoder (#11766)
|
2025-10-17 10:21:00 -07:00 |
|
Yineng Zhang
|
da681f35d3
|
Revert "Set csgmv as default lora backend. (#11488)" (#11735)
|
2025-10-17 12:01:36 -05:00 |
|
pdasgup
|
9b0f725b1d
|
add tuned fuse moe kernel for qwen3 235b fp8 on h200 (#11730)
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
|
2025-10-17 09:55:09 -07:00 |
|
Liangsheng Yin
|
cde5a6e30f
|
Abstraction for spec worker and code cleanup (#11643)
|
2025-10-17 23:31:36 +08:00 |
|
Mick
|
3e4c7da2f5
|
ci: reduce and refactor vlm ut and combine test files (#11062)
|
2025-10-17 15:24:50 +00:00 |
|
Liangsheng Yin
|
d88ac9bc9a
|
[overlap-spec] Make plan stream an option (#11724)
|
2025-10-17 15:48:57 +08:00 |
|
Liangsheng Yin
|
ce11dd82dc
|
[CI] Try fix broken event loop init (#11746)
|
2025-10-17 13:30:17 +08:00 |
|
Chang Su
|
9e87b60f37
|
[router][CI] Clean up deprecated fields in pr-test-pd-router.yml (#11739)
|
2025-10-16 19:01:00 -07:00 |
|
Keyang Ru
|
7780230a15
|
Revert "[router] fix get_models endpoint for openai router (#11687)" (#11740)
|
2025-10-16 18:36:53 -07:00 |
|
Chang Su
|
dc01313da1
|
[router] Add rustfmt and set group imports by default (#11732)
|
2025-10-16 17:33:29 -07:00 |
|
Keyang Ru
|
7a7f99beb7
|
[router] add spec.rs to enables tests under spec folder (#11734)
|
2025-10-16 16:07:26 -07:00 |
|
StonyPort
|
fd389df96e
|
Reduce the image processing latency in VLM (#11541)
Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>
|
2025-10-16 15:00:03 -07:00 |
|
Baizhou Zhang
|
b0d1d717e1
|
Revert "make radix cache deterministic" (#11728)
|
2025-10-16 14:36:15 -07:00 |
|
Chang Su
|
c7962868c1
|
[router] Fix tool_choice normalization in ChatCompletionRequest and fix ut (#11731)
|
2025-10-16 14:20:13 -07:00 |
|
Simo Lin
|
4f24ab1718
|
[router][grpc] add dissag info to warm up in grpc server (#11727)
|
2025-10-16 14:19:55 -07:00 |
|
Simo Lin
|
64affab495
|
[router] fix p and d worker filtering and bootstrap port handling (#11729)
|
2025-10-16 14:19:39 -07:00 |
|
Keyang Ru
|
4c9bcb9d56
|
[Router] Refactor protocol definitions: split spec.rs into modular files (#11677)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
|
2025-10-16 13:44:44 -07:00 |
|
Mick
|
86b04d25b3
|
model: qwen3-omni (thinker-only) (#10911)
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2025-10-16 13:20:38 -07:00 |
|
sglang-bot
|
85ebeecf06
|
chore: bump SGLang version to 0.5.3.post3 (#11693)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
|
2025-10-16 13:14:55 -07:00 |
|
Hank Han
|
0dd6cf16ba
|
[ci]use H20 to run disaggregation test (#11543)
|
2025-10-16 11:42:42 -07:00 |
|
Keyang Ru
|
0975ba99bc
|
[router] fix get_models endpoint for openai router (#11687)
|
2025-10-16 09:00:08 -07:00 |
|
Shangming Cai
|
1de3924b18
|
[CI] Add GLM4MoE model test (#11706)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2025-10-16 16:25:58 +08:00 |
|
Even Zhou
|
3cceaa381a
|
[Bugfix] Fix Qwen3/DSV3/DSV3.2 model support (#11510)
|
2025-10-16 15:14:09 +08:00 |
|
Lifu Huang
|
b0d20cdec7
|
Set csgmv as default lora backend. (#11488)
|
2025-10-15 23:53:24 -05:00 |
|
YanbingJiang
|
cbac499750
|
Split test_intel_amx_attention_backend.py to pass CI of timeout (#11370)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2025-10-15 19:22:32 -07:00 |
|
Shangming Cai
|
476c67d7fc
|
Fix missing a2a backend init of GLM4.5 MoE Block (#11692)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2025-10-15 19:13:08 -07:00 |
|
Fan Yin
|
3289da5b41
|
[sgl-kernel] support hadamard (#11663)
|
2025-10-15 19:00:44 -07:00 |
|
Shangming Cai
|
868403f642
|
[PD] Add PD support for hybrid model (Qwen3-Next, DeepSeek V3.2 Exp) (#10912)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: hzh0425 <hzh0425@apache.org>
Co-authored-by: ZeldaHuang <hzm414167@alibaba-inc.com>
|
2025-10-15 18:59:14 -07:00 |
|
Hanming Lu
|
97d857c096
|
[Mamba] Increase default mamba_full_memory_ratio to 0.9 (#11679)
|
2025-10-16 09:56:43 +08:00 |
|
Yineng Zhang
|
52a54a26b2
|
docs: Add Contributor Covenant Code of Conduct (#11689)
|
2025-10-15 18:50:26 -07:00 |
|
Lianmin Zheng
|
cd7e1bd591
|
Sync code and test CI; rename some env vars (#11686)
|
2025-10-15 18:37:03 -07:00 |
|
Huaiyu, Zheng
|
729b7edf72
|
enable rmsnorm on XPU (#10248)
|
2025-10-15 17:54:18 -07:00 |
|
DiweiSun
|
4c03dbaaef
|
[CI][XPU]enable sglang CI on Intel XPU (#9493)
Co-authored-by: huaiyuzh <huaiyu.zheng@intel.com>
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
|
2025-10-15 17:13:19 -07:00 |
|
sglang-bot
|
baf277a9bf
|
chore: bump SGLang version to 0.5.3.post2 (#11680)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
|
2025-10-15 16:49:14 -07:00 |
|