Netanel Haber
|
a98496834b
|
Feature/nano v2 offline modelopt fp8 and nvfp4 (#12018)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2025-10-23 11:16:46 -07:00 |
|
cctry
|
b0b4f71679
|
[Fix] memory leak by overlap + retract (#11981)
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
|
2025-10-23 22:59:23 +08:00 |
|
Liangsheng Yin
|
6c18addb6f
|
Revert "Support nvidia/NVIDIA-Nemotron-Nano-9B-v2-FP8/NVFP4" (#12015)
|
2025-10-23 21:27:58 +08:00 |
|
Netanel Haber
|
d6fee73d1f
|
Support nvidia/NVIDIA-Nemotron-Nano-9B-v2-FP8/NVFP4 (#11866)
|
2025-10-23 17:29:02 +08:00 |
|
Zhiyu
|
80b2b3207a
|
Enable native ModelOpt quantization support (3/3) (#10154)
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
|
2025-10-21 21:44:29 -07:00 |
|
Shangming Cai
|
05d3667ab9
|
[CI] disable glm4.1v and fix the flashinfer installation (#11902)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2025-10-21 18:38:35 +08:00 |
|
Neelabh Sinha
|
852c0578fd
|
[FEATURE] Add OpenAI-Compatible LoRA Adapter Selection (#11570)
|
2025-10-21 15:44:33 +08:00 |
|
Yuan Luo
|
271d3d0d50
|
Support mrope triton kernel and add unit test (#11722)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: b8zhong <b8zhong@uwaterloo.ca>
|
2025-10-20 11:51:07 +08:00 |
|
Johnny
|
252dc4e112
|
[NVIDIA] FA3/FA4 Fix (#11606)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-10-19 17:10:10 -07:00 |
|
Baizhou Zhang
|
cbb5fc2edc
|
[CI] Add CI test for DeepSeek V3.2 MTP (#11835)
|
2025-10-19 17:00:25 -07:00 |
|
Night
|
53fb229f53
|
[logprobs] Enable local deterministic logrprobs testing with strict threshold (#10994)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-19 13:30:39 -07:00 |
|
Minglei Zhu
|
13219e1e48
|
completely remove mixed mode deterministic test as prefix mode could cover it (#11783)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-10-17 17:46:03 -07:00 |
|
Lianmin Zheng
|
b9a54e0968
|
[minor] sync code on python/sglang/test/test_deterministic.py and improve ci tests (#11777)
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
|
2025-10-17 14:25:22 -07:00 |
|
Mick
|
3e4c7da2f5
|
ci: reduce and refactor vlm ut and combine test files (#11062)
|
2025-10-17 15:24:50 +00:00 |
|
Hank Han
|
0dd6cf16ba
|
[ci]use H20 to run disaggregation test (#11543)
|
2025-10-16 11:42:42 -07:00 |
|
Shangming Cai
|
1de3924b18
|
[CI] Add GLM4MoE model test (#11706)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2025-10-16 16:25:58 +08:00 |
|
YanbingJiang
|
cbac499750
|
Split test_intel_amx_attention_backend.py to pass CI of timeout (#11370)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2025-10-15 19:22:32 -07:00 |
|
Shangming Cai
|
868403f642
|
[PD] Add PD support for hybrid model (Qwen3-Next, DeepSeek V3.2 Exp) (#10912)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: hzh0425 <hzh0425@apache.org>
Co-authored-by: ZeldaHuang <hzm414167@alibaba-inc.com>
|
2025-10-15 18:59:14 -07:00 |
|
DiweiSun
|
4c03dbaaef
|
[CI][XPU]enable sglang CI on Intel XPU (#9493)
Co-authored-by: huaiyuzh <huaiyu.zheng@intel.com>
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
|
2025-10-15 17:13:19 -07:00 |
|
Xun Sun
|
a40229f6f8
|
[1/N] Introduce Mooncake Backend and Mooncake EP to Support Elastic EP (#10423)
Co-authored-by: Hank Han <hanhan7630@outlook.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2025-10-14 19:40:54 -07:00 |
|
yinghui
|
56222658ec
|
move eagle draft post process to cuda graph (#11434)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2025-10-14 22:50:53 +08:00 |
|
Chenxi Li
|
28f80b1244
|
Implement LRU eviction policy for LoRA adapters (#11041)
|
2025-10-13 20:18:25 -07:00 |
|
Neelabh Sinha
|
aaf7af1b17
|
[FEATURE] Add Profile Trace Merger for Distributed Traces (#11413)
|
2025-10-14 09:20:17 +08:00 |
|
Baizhou Zhang
|
9f1f699a7a
|
[CI] Add Basic Test for DeepSeek V3.2 (#11308)
|
2025-10-13 11:41:02 -07:00 |
|
Shangming Cai
|
c5fe3c0b75
|
Tiny fix test run estimated time (#11544)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2025-10-13 02:23:13 -07:00 |
|
Yi Zhang
|
a55cf5304a
|
[Feature] Support mamba radix cache v0 (#11214)
Co-authored-by: hanming-lu <hanming@x.ai>
Co-authored-by: hzh0425 <hzh0425@apache.org>
Co-authored-by: thalahors <ericalcaide1@gmail.com>
|
2025-10-12 20:57:15 -07:00 |
|
Lianmin Zheng
|
5a6ec8f999
|
Fix unit tests (#11503)
|
2025-10-12 07:45:57 -07:00 |
|
Lianmin Zheng
|
548a57b1f3
|
Fix port conflicts in CI (#11497)
|
2025-10-12 06:46:36 -07:00 |
|
Liangsheng Yin
|
20a6c0a63d
|
Beta spec-overlap for EAGLE (#11398)
Co-authored-by: Lianmin Zheng <15100009+merrymercy@users.noreply.github.com>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
|
2025-10-12 11:02:22 +08:00 |
|
Stefan He
|
eae9a9fb9d
|
Fix batch invariant ops (#11368)
|
2025-10-10 20:49:08 -07:00 |
|
Lianmin Zheng
|
61055cb309
|
Reorder PD disagg CI tests (#11438)
|
2025-10-10 17:56:49 -07:00 |
|
Shangming Cai
|
70fbb3adf6
|
[CI] Refactor PD disaggregation test suite (#11363)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2025-10-09 18:50:39 -07:00 |
|
Netanel Haber
|
d6837aea4d
|
model: Support Hybrid Mamba2 NemotronHForCausalLM (nvidia/NVIDIA-Nemotron-Nano-9B-v2) (#10909)
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
|
2025-10-09 00:37:38 +08:00 |
|
Liangsheng Yin
|
c882b5ae75
|
[CI] improve disaggregation CI. (#11264)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2025-10-08 21:40:56 +08:00 |
|
Cheng Wan
|
3c06b673af
|
[8/N] MoE Refactor: deprecate EPMoE (#11211)
|
2025-10-07 21:51:41 -07:00 |
|
Ke Bao
|
24bc3fb0f9
|
EAGLE cache fix for SWARadixCache (#11231)
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
|
2025-10-07 18:21:37 +08:00 |
|
Alex Chi Z
|
9b4c449735
|
convert test_deterministic into unit tests (#11095)
Signed-off-by: Alex Chi Z <iskyzh@gmail.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-10-06 20:33:11 -07:00 |
|
sunxxuns
|
a57f0e3d56
|
reverse the amd ci test back to 1200s and split the 8-gpu deepseek job into two. (#11238)
Co-authored-by: root <root@smci350-zts-gtu-e17-15.zts-gtu.dcgpu>
|
2025-10-06 19:27:57 -04:00 |
|
Zhiyu
|
155cbb51f0
|
Enable native ModelOpt quantization support (1/3) (#7149)
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
|
2025-10-06 13:24:15 -07:00 |
|
Lianmin Zheng
|
d645ae90a3
|
Rename runner labels (#11228)
|
2025-10-05 18:05:41 -07:00 |
|
sunxxuns
|
5e142484e2
|
[Fix AMD CI] VRAM cleanup (#11174)
Co-authored-by: root <root@smci350-zts-gtu-e17-15.zts-gtu.dcgpu>
|
2025-10-05 19:03:53 -04:00 |
|
Shangming Cai
|
c560410da7
|
Refactor and optimize mooncake CI (#11162)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2025-10-05 14:08:52 -07:00 |
|
Ke Bao
|
31b49c0b51
|
EAGLE cache fix for HiCache (#11215)
|
2025-10-04 16:53:53 -07:00 |
|
hzh0425
|
c70e58e837
|
[HICache]: Refactor HiCache CI (#11011)
Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2025-10-03 20:51:56 -04:00 |
|
fzyzcjy
|
6794d21051
|
Tiny add PD disaggregation + DP attention test (#11167)
|
2025-10-03 14:15:46 +08:00 |
|
ilyasch2
|
083629c235
|
[model] Add mamba2 and Falcon-H1 support. (#10988)
Co-authored-by: Younes Belkada <younes.belkada@tii.ae>
Co-authored-by: Younes B <49240599+younesbelkada@users.noreply.github.com>
|
2025-10-02 19:15:36 +08:00 |
|
Sai Enduri
|
195a59fe23
|
Refactor AMD CI. (#11128)
|
2025-10-01 01:12:28 -07:00 |
|
narutolhy
|
d17986f8c6
|
Enable optional FP32 compute for LM Head (#10729)
Thanks to MiniMax Team and Chenyang Zhao's support.
|
2025-09-29 20:45:17 -07:00 |
|
Lianmin Zheng
|
a17e70f5cc
|
Use more general heuristics to set the default value of --mem-fraction-static (#10975)
Co-authored-by: sglang-bot <sglangbot@gmail.com>
|
2025-09-29 10:11:03 -07:00 |
|
Zhihao Zhang
|
24f7cb1ece
|
[speculative decoding] rename lookahead to ngram (#11010)
Co-authored-by: a4zhangfei <a4zhangfei@qq.com>
|
2025-09-28 21:06:59 -07:00 |
|