Commit Graph

1100 Commits

Author SHA1 Message Date
b8zhong
8ae9d4bb41 Revert "[ROCm] Remove vLLM rope dependency & use AITER impl" (#12028) 2025-10-23 12:42:59 -07:00
Mick
770529a731 model: support deepseek-ocr (#11891)
Co-authored-by: yhyang201 <47235274+yhyang201@users.noreply.github.com>
Co-authored-by: yhyang201 <yhyang201@gmail.com>
Co-authored-by: Shi Shuai <126407087+shuaills@users.noreply.github.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2025-10-24 03:15:17 +08:00
Mick
8bd26dd4e6 ci: fix night-ci with push retry mechanism (#11765) 2025-10-23 11:31:05 -07:00
Netanel Haber
a98496834b Feature/nano v2 offline modelopt fp8 and nvfp4 (#12018)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
2025-10-23 11:16:46 -07:00
cctry
b0b4f71679 [Fix] memory leak by overlap + retract (#11981)
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
2025-10-23 22:59:23 +08:00
Liangsheng Yin
6c18addb6f Revert "Support nvidia/NVIDIA-Nemotron-Nano-9B-v2-FP8/NVFP4" (#12015) 2025-10-23 21:27:58 +08:00
Netanel Haber
d6fee73d1f Support nvidia/NVIDIA-Nemotron-Nano-9B-v2-FP8/NVFP4 (#11866) 2025-10-23 17:29:02 +08:00
blzheng
13fb8b5489 [CPU] Optimize FP16 decode_attention_cpu (#10652) 2025-10-22 21:39:51 -07:00
b8zhong
4d4feccbb2 [ROCm] Remove vLLM rope dependency & use AITER impl (#11322) 2025-10-22 19:17:34 -07:00
Shangming Cai
7fceeef599 Fix flaky hicache test with mooncake backend (#11953)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2025-10-22 21:00:47 +08:00
Hank Han
904655c5fd [2/N] Added the core structure of elastic EP and the eplb algorithm with faulty rank (#10606)
Co-authored-by: Xun Sun <UNIDY2002@outlook.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
2025-10-22 01:13:31 -07:00
Zhiyu
80b2b3207a Enable native ModelOpt quantization support (3/3) (#10154)
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
2025-10-21 21:44:29 -07:00
b8zhong
d0a64c7e2c vlm: enforce pybase64 for image and str encode/decode (#10700) 2025-10-21 19:05:32 +08:00
Shangming Cai
05d3667ab9 [CI] disable glm4.1v and fix the flashinfer installation (#11902)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2025-10-21 18:38:35 +08:00
ybyang
dbb16bedd5 Support Thinking Budget (via custom_logit_processor for OpenAI API) [Fix #6572] (#11416)
Signed-off-by: ybyang <ybyang7@iflytek.com>
Co-authored-by: YorkSu <york_su@qq.com>
2025-10-21 16:27:56 +08:00
Neelabh Sinha
852c0578fd [FEATURE] Add OpenAI-Compatible LoRA Adapter Selection (#11570) 2025-10-21 15:44:33 +08:00
Meng, Hengyu
b113c72e7a Init attention backend for Intel XPU (#10656)
Co-authored-by: guangyey <guangye.yu@intel.com>
Co-authored-by: DiweiSun <105627594+DiweiSun@users.noreply.github.com>
2025-10-21 11:41:28 +08:00
Xiaoyu Zhang
8374a96e49 piecewise cuda graph support qwen3-moe (#11845) 2025-10-21 10:55:49 +08:00
DarkSharpness
276e7b3e4e [Feature] New structural tag support (#10691) 2025-10-20 18:25:58 +08:00
Shane A
d383e6616e [Model] Add Olmo 3 model support (#11396) 2025-10-19 23:59:16 -07:00
Yuan Luo
271d3d0d50 Support mrope triton kernel and add unit test (#11722)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: b8zhong <b8zhong@uwaterloo.ca>
2025-10-20 11:51:07 +08:00
Johnny
252dc4e112 [NVIDIA] FA3/FA4 Fix (#11606)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2025-10-19 17:10:10 -07:00
Baizhou Zhang
cbb5fc2edc [CI] Add CI test for DeepSeek V3.2 MTP (#11835) 2025-10-19 17:00:25 -07:00
Night
53fb229f53 [logprobs] Enable local deterministic logrprobs testing with strict threshold (#10994)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-19 13:30:39 -07:00
Stefan He
4fff1ec1d9 Deterministic Mode: Add 1-stage triton kernel for prefill (#11147)
Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com>
Co-authored-by: Binyao Jiang <bijiang@linkedin.com>
2025-10-20 01:47:36 +08:00
Liangsheng Yin
7a020e0f3b [Test] Add basic matched stop for beta eagle (#11833) 2025-10-20 01:17:00 +08:00
b8zhong
f4f8a1b4d8 ci: update lmms-eval to speed up multimodal CI (#11000) 2025-10-19 02:51:19 +08:00
Minglei Zhu
13219e1e48 completely remove mixed mode deterministic test as prefix mode could cover it (#11783)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2025-10-17 17:46:03 -07:00
Lianmin Zheng
b9a54e0968 [minor] sync code on python/sglang/test/test_deterministic.py and improve ci tests (#11777)
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
2025-10-17 14:25:22 -07:00
Chunyuan WU
8fcc69e7c4 Turn on shm_allreduce and shm_allgather for fp16 (#10725) 2025-10-17 12:35:20 -07:00
Yineng Zhang
da681f35d3 Revert "Set csgmv as default lora backend. (#11488)" (#11735) 2025-10-17 12:01:36 -05:00
Mick
3e4c7da2f5 ci: reduce and refactor vlm ut and combine test files (#11062) 2025-10-17 15:24:50 +00:00
Mick
86b04d25b3 model: qwen3-omni (thinker-only) (#10911)
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2025-10-16 13:20:38 -07:00
Hank Han
0dd6cf16ba [ci]use H20 to run disaggregation test (#11543) 2025-10-16 11:42:42 -07:00
Shangming Cai
1de3924b18 [CI] Add GLM4MoE model test (#11706)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2025-10-16 16:25:58 +08:00
Even Zhou
3cceaa381a [Bugfix] Fix Qwen3/DSV3/DSV3.2 model support (#11510) 2025-10-16 15:14:09 +08:00
Lifu Huang
b0d20cdec7 Set csgmv as default lora backend. (#11488) 2025-10-15 23:53:24 -05:00
YanbingJiang
cbac499750 Split test_intel_amx_attention_backend.py to pass CI of timeout (#11370)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
2025-10-15 19:22:32 -07:00
Shangming Cai
868403f642 [PD] Add PD support for hybrid model (Qwen3-Next, DeepSeek V3.2 Exp) (#10912)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: hzh0425 <hzh0425@apache.org>
Co-authored-by: ZeldaHuang <hzm414167@alibaba-inc.com>
2025-10-15 18:59:14 -07:00
Lianmin Zheng
cd7e1bd591 Sync code and test CI; rename some env vars (#11686) 2025-10-15 18:37:03 -07:00
DiweiSun
4c03dbaaef [CI][XPU]enable sglang CI on Intel XPU (#9493)
Co-authored-by: huaiyuzh <huaiyu.zheng@intel.com>
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-10-15 17:13:19 -07:00
Jinwu
825432fce6 [1/N]Support DeepSeek-R1 w4a8 normal deepep (#8247)
Co-authored-by: Hank Han <hanhan7630@outlook.com>
2025-10-14 20:10:53 -07:00
Xun Sun
a40229f6f8 [1/N] Introduce Mooncake Backend and Mooncake EP to Support Elastic EP (#10423)
Co-authored-by: Hank Han <hanhan7630@outlook.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
2025-10-14 19:40:54 -07:00
yinghui
56222658ec move eagle draft post process to cuda graph (#11434)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2025-10-14 22:50:53 +08:00
Chenxi Li
28f80b1244 Implement LRU eviction policy for LoRA adapters (#11041) 2025-10-13 20:18:25 -07:00
Neelabh Sinha
aaf7af1b17 [FEATURE] Add Profile Trace Merger for Distributed Traces (#11413) 2025-10-14 09:20:17 +08:00
Baizhou Zhang
9f1f699a7a [CI] Add Basic Test for DeepSeek V3.2 (#11308) 2025-10-13 11:41:02 -07:00
Liangsheng Yin
acc2327bbd Move deep gemm related arguments to sglang.srt.environ (#11547) 2025-10-14 00:34:35 +08:00
Mick
f35f120d70 fix: fix video input for qwen3-vl (#11442) 2025-10-13 09:30:43 -07:00
Liangsheng Yin
516738b096 Depreate global_server_args_dict (#11528) 2025-10-13 19:34:43 +08:00