Lifu Huang
|
b0d20cdec7
|
Set csgmv as default lora backend. (#11488)
|
2025-10-15 23:53:24 -05:00 |
|
YanbingJiang
|
cbac499750
|
Split test_intel_amx_attention_backend.py to pass CI of timeout (#11370)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2025-10-15 19:22:32 -07:00 |
|
Shangming Cai
|
868403f642
|
[PD] Add PD support for hybrid model (Qwen3-Next, DeepSeek V3.2 Exp) (#10912)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: hzh0425 <hzh0425@apache.org>
Co-authored-by: ZeldaHuang <hzm414167@alibaba-inc.com>
|
2025-10-15 18:59:14 -07:00 |
|
Lianmin Zheng
|
cd7e1bd591
|
Sync code and test CI; rename some env vars (#11686)
|
2025-10-15 18:37:03 -07:00 |
|
DiweiSun
|
4c03dbaaef
|
[CI][XPU]enable sglang CI on Intel XPU (#9493)
Co-authored-by: huaiyuzh <huaiyu.zheng@intel.com>
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
|
2025-10-15 17:13:19 -07:00 |
|
Jinwu
|
825432fce6
|
[1/N]Support DeepSeek-R1 w4a8 normal deepep (#8247)
Co-authored-by: Hank Han <hanhan7630@outlook.com>
|
2025-10-14 20:10:53 -07:00 |
|
Xun Sun
|
a40229f6f8
|
[1/N] Introduce Mooncake Backend and Mooncake EP to Support Elastic EP (#10423)
Co-authored-by: Hank Han <hanhan7630@outlook.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2025-10-14 19:40:54 -07:00 |
|
yinghui
|
56222658ec
|
move eagle draft post process to cuda graph (#11434)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2025-10-14 22:50:53 +08:00 |
|
Chenxi Li
|
28f80b1244
|
Implement LRU eviction policy for LoRA adapters (#11041)
|
2025-10-13 20:18:25 -07:00 |
|
Neelabh Sinha
|
aaf7af1b17
|
[FEATURE] Add Profile Trace Merger for Distributed Traces (#11413)
|
2025-10-14 09:20:17 +08:00 |
|
Baizhou Zhang
|
9f1f699a7a
|
[CI] Add Basic Test for DeepSeek V3.2 (#11308)
|
2025-10-13 11:41:02 -07:00 |
|
Liangsheng Yin
|
acc2327bbd
|
Move deep gemm related arguments to sglang.srt.environ (#11547)
|
2025-10-14 00:34:35 +08:00 |
|
Mick
|
f35f120d70
|
fix: fix video input for qwen3-vl (#11442)
|
2025-10-13 09:30:43 -07:00 |
|
Liangsheng Yin
|
516738b096
|
Depreate global_server_args_dict (#11528)
|
2025-10-13 19:34:43 +08:00 |
|
Shangming Cai
|
c5fe3c0b75
|
Tiny fix test run estimated time (#11544)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2025-10-13 02:23:13 -07:00 |
|
hzh0425
|
318424e2c8
|
[HICache]: Support 3FS-Store with page_first_direct layout (#11460)
|
2025-10-13 15:47:22 +08:00 |
|
Mick
|
0c0779d667
|
ci: improve nightly-ci (#11385)
|
2025-10-12 21:19:34 -07:00 |
|
Yi Zhang
|
a55cf5304a
|
[Feature] Support mamba radix cache v0 (#11214)
Co-authored-by: hanming-lu <hanming@x.ai>
Co-authored-by: hzh0425 <hzh0425@apache.org>
Co-authored-by: thalahors <ericalcaide1@gmail.com>
|
2025-10-12 20:57:15 -07:00 |
|
Yongtong Wu
|
a20e7df8d0
|
Improve dp attention port assignment scheme (#5889)
Co-authored-by: Cheng Wan <cwan@x.ai>
|
2025-10-12 17:55:59 -07:00 |
|
Cheng Wan
|
1bdd010291
|
Revert "Deprecate global_server_args_dict" (#11520)
|
2025-10-12 17:40:40 -07:00 |
|
Liangsheng Yin
|
1083e7e3df
|
Deprecate global_server_args_dict (#11331)
|
2025-10-13 01:20:47 +08:00 |
|
Liangsheng Yin
|
2157d12ae8
|
[CI] fix lint (#11509)
|
2025-10-13 01:07:21 +08:00 |
|
Mick
|
9f2b457cbe
|
doc: add doc for adding new models into nightly-ci (#11443)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2025-10-12 08:35:10 -07:00 |
|
Lianmin Zheng
|
5a6ec8f999
|
Fix unit tests (#11503)
|
2025-10-12 07:45:57 -07:00 |
|
Lianmin Zheng
|
548a57b1f3
|
Fix port conflicts in CI (#11497)
|
2025-10-12 06:46:36 -07:00 |
|
Yuwei An
|
4ac8e09df0
|
Piecewise CUDA Graph Support & Torch Compile Backend (#10062)
Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>
|
2025-10-12 11:55:57 +08:00 |
|
Liangsheng Yin
|
20a6c0a63d
|
Beta spec-overlap for EAGLE (#11398)
Co-authored-by: Lianmin Zheng <15100009+merrymercy@users.noreply.github.com>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
|
2025-10-12 11:02:22 +08:00 |
|
Glen Liu
|
47c606d3dc
|
[Feature] support regex strings as a stopping condition (#10635)
|
2025-10-12 10:53:15 +08:00 |
|
Binyao Jiang
|
451d15c44b
|
[DPSKv3.2] Rewrite nsa tilelang act_quant kernel to triton (#11450)
|
2025-10-10 23:13:46 -07:00 |
|
Stefan He
|
eae9a9fb9d
|
Fix batch invariant ops (#11368)
|
2025-10-10 20:49:08 -07:00 |
|
Lianmin Zheng
|
61055cb309
|
Reorder PD disagg CI tests (#11438)
|
2025-10-10 17:56:49 -07:00 |
|
cctry
|
b36afed4a7
|
Separate allocation logic from scheduler (#11313)
|
2025-10-10 17:38:54 -07:00 |
|
Lianmin Zheng
|
b4408e6098
|
Revert "fix: fix video input for qwen3-vl" (#11437)
|
2025-10-10 12:44:40 -07:00 |
|
Mick
|
a1a20b4c7c
|
fix: fix video input for qwen3-vl (#11361)
|
2025-10-10 04:35:35 -07:00 |
|
hzh0425
|
ee3bd8a1c8
|
feat(hicache): Support passing prefix keys for l3 store. (#9045)
Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2025-10-10 00:22:05 -07:00 |
|
Shangming Cai
|
70fbb3adf6
|
[CI] Refactor PD disaggregation test suite (#11363)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2025-10-09 18:50:39 -07:00 |
|
Glen Liu
|
9a7e7a6576
|
[Bug Fix] prevent lora adapter from being loaded into LoRAManager if it is already loaded (#11365)
|
2025-10-09 18:43:03 -07:00 |
|
Sundara Raman Ramachandran
|
53bd00d975
|
[Generative Score API] Multi-Item scoring with custom attention mask. (#10979)
|
2025-10-08 18:47:32 -07:00 |
|
Netanel Haber
|
d6837aea4d
|
model: Support Hybrid Mamba2 NemotronHForCausalLM (nvidia/NVIDIA-Nemotron-Nano-9B-v2) (#10909)
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
|
2025-10-09 00:37:38 +08:00 |
|
Liangsheng Yin
|
c882b5ae75
|
[CI] improve disaggregation CI. (#11264)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2025-10-08 21:40:56 +08:00 |
|
Cheng Wan
|
3c06b673af
|
[8/N] MoE Refactor: deprecate EPMoE (#11211)
|
2025-10-07 21:51:41 -07:00 |
|
Adarsh Shirawalmath
|
7c3f07dbcb
|
[Feature] Add /tokenize and /detokenize OpenAI compatible endpoints (#9545)
|
2025-10-08 12:38:48 +08:00 |
|
Mick
|
64d1505c0a
|
ci: unify the model launch method of nightly ci (#11230)
|
2025-10-07 18:13:14 -07:00 |
|
cctry
|
f3764c26a3
|
Clean match_prefix and prepare_for_extend for mem cache V2 (#11200)
|
2025-10-07 17:54:18 -07:00 |
|
Chang Su
|
7ba3de0e92
|
[oai serving chat] Add argument --sampling-defaults and fix ChatCompletionRequest defaults (#11304)
|
2025-10-08 00:36:05 +00:00 |
|
Ke Bao
|
24bc3fb0f9
|
EAGLE cache fix for SWARadixCache (#11231)
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
|
2025-10-07 18:21:37 +08:00 |
|
Liangsheng Yin
|
8a8a608af9
|
[ci] fix pp test (#11294)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-07 14:20:04 +08:00 |
|
Alex Chi Z
|
9b4c449735
|
convert test_deterministic into unit tests (#11095)
Signed-off-by: Alex Chi Z <iskyzh@gmail.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-10-06 20:33:11 -07:00 |
|
sunxxuns
|
a57f0e3d56
|
reverse the amd ci test back to 1200s and split the 8-gpu deepseek job into two. (#11238)
Co-authored-by: root <root@smci350-zts-gtu-e17-15.zts-gtu.dcgpu>
|
2025-10-06 19:27:57 -04:00 |
|
Zhiyu
|
155cbb51f0
|
Enable native ModelOpt quantization support (1/3) (#7149)
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
|
2025-10-06 13:24:15 -07:00 |
|