Commit Graph

1030 Commits

Author SHA1 Message Date
hzh0425
ee3bd8a1c8 feat(hicache): Support passing prefix keys for l3 store. (#9045)
Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-10-10 00:22:05 -07:00
Shangming Cai
70fbb3adf6 [CI] Refactor PD disaggregation test suite (#11363)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2025-10-09 18:50:39 -07:00
Glen Liu
9a7e7a6576 [Bug Fix] prevent lora adapter from being loaded into LoRAManager if it is already loaded (#11365) 2025-10-09 18:43:03 -07:00
Sundara Raman Ramachandran
53bd00d975 [Generative Score API] Multi-Item scoring with custom attention mask. (#10979) 2025-10-08 18:47:32 -07:00
Netanel Haber
d6837aea4d model: Support Hybrid Mamba2 NemotronHForCausalLM (nvidia/NVIDIA-Nemotron-Nano-9B-v2) (#10909)
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
2025-10-09 00:37:38 +08:00
Liangsheng Yin
c882b5ae75 [CI] improve disaggregation CI. (#11264)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
2025-10-08 21:40:56 +08:00
Cheng Wan
3c06b673af [8/N] MoE Refactor: deprecate EPMoE (#11211) 2025-10-07 21:51:41 -07:00
Adarsh Shirawalmath
7c3f07dbcb [Feature] Add /tokenize and /detokenize OpenAI compatible endpoints (#9545) 2025-10-08 12:38:48 +08:00
Mick
64d1505c0a ci: unify the model launch method of nightly ci (#11230) 2025-10-07 18:13:14 -07:00
cctry
f3764c26a3 Clean match_prefix and prepare_for_extend for mem cache V2 (#11200) 2025-10-07 17:54:18 -07:00
Chang Su
7ba3de0e92 [oai serving chat] Add argument --sampling-defaults and fix ChatCompletionRequest defaults (#11304) 2025-10-08 00:36:05 +00:00
Ke Bao
24bc3fb0f9 EAGLE cache fix for SWARadixCache (#11231)
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
2025-10-07 18:21:37 +08:00
Liangsheng Yin
8a8a608af9 [ci] fix pp test (#11294)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-07 14:20:04 +08:00
Alex Chi Z
9b4c449735 convert test_deterministic into unit tests (#11095)
Signed-off-by: Alex Chi Z <iskyzh@gmail.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2025-10-06 20:33:11 -07:00
sunxxuns
a57f0e3d56 reverse the amd ci test back to 1200s and split the 8-gpu deepseek job into two. (#11238)
Co-authored-by: root <root@smci350-zts-gtu-e17-15.zts-gtu.dcgpu>
2025-10-06 19:27:57 -04:00
Zhiyu
155cbb51f0 Enable native ModelOpt quantization support (1/3) (#7149)
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
2025-10-06 13:24:15 -07:00
Lianmin Zheng
d645ae90a3 Rename runner labels (#11228) 2025-10-05 18:05:41 -07:00
Xinyuan Tong
652c24a653 Update transformers package version to 4.57.0 (#11222)
Co-authored-by: yhyang201 <yhyang201@gmail.com>
2025-10-05 23:45:14 +00:00
sunxxuns
5e142484e2 [Fix AMD CI] VRAM cleanup (#11174)
Co-authored-by: root <root@smci350-zts-gtu-e17-15.zts-gtu.dcgpu>
2025-10-05 19:03:53 -04:00
Shangming Cai
c560410da7 Refactor and optimize mooncake CI (#11162)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2025-10-05 14:08:52 -07:00
Vincent Zhong
36a6b8dbfc Update v1/responses to be more OpenAI-compatible. (#9624) 2025-10-05 18:47:46 +00:00
Ke Bao
31b49c0b51 EAGLE cache fix for HiCache (#11215) 2025-10-04 16:53:53 -07:00
Hank Han
666da3d59f [fix]enable flashmla when using draft model P/D attention select (#11012) 2025-10-04 20:59:34 +08:00
hzh0425
c70e58e837 [HICache]: Refactor HiCache CI (#11011)
Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-10-03 20:51:56 -04:00
Liangsheng Yin
4726c9197f [minor] fix the lint (#11198) 2025-10-04 01:04:58 +08:00
vikram singh shekhawat
586e81a28a [Test] Initialize mem_fraction_static in setUpClass to fix pytest VLM test crashes. (#10859)
Co-authored-by: svc_repro_tool <svc_repro_tool@habana.ai>
2025-10-04 00:14:48 +08:00
shubham singhal
03def5e3b1 Fix [test]: Env:SGLANG_TORCH_PROFILER_DIR for pytest. (#10780) 2025-10-03 22:59:32 +08:00
fzyzcjy
fdc4e1e570 Tiny move files to utils folder (#11166) 2025-10-03 22:40:06 +08:00
Matt Nappo
8c57490210 [Feature] Option to save model weights to CPU when memory saver mode is enabled (#10873)
Co-authored-by: molocule <34072934+molocule@users.noreply.github.com>
2025-10-03 16:48:19 +08:00
fzyzcjy
6794d21051 Tiny add PD disaggregation + DP attention test (#11167) 2025-10-03 14:15:46 +08:00
Vedant V Jhaveri
7e61737d3f [Generative Scores API] add performance tests to CICD (#10830) 2025-10-02 19:57:55 -07:00
Liangsheng Yin
7ff740a6ce Remove dp balance metadata and minimul token balance. (#11170) 2025-10-03 01:48:15 +08:00
ilyasch2
083629c235 [model] Add mamba2 and Falcon-H1 support. (#10988)
Co-authored-by: Younes Belkada <younes.belkada@tii.ae>
Co-authored-by: Younes B <49240599+younesbelkada@users.noreply.github.com>
2025-10-02 19:15:36 +08:00
Liangsheng Yin
25e7dbe8af Fix ngram spec with page size > 1 (#11135) 2025-10-02 12:34:23 +08:00
Sai Enduri
195a59fe23 Refactor AMD CI. (#11128) 2025-10-01 01:12:28 -07:00
Liangsheng Yin
73d4a5f879 Organize spec-related data structures (#10735) 2025-10-01 09:45:30 +08:00
Ke Bao
91847e382a Fix eagle radix cache (#10846) 2025-09-30 22:59:20 +08:00
narutolhy
d17986f8c6 Enable optional FP32 compute for LM Head (#10729)
Thanks to MiniMax Team and Chenyang Zhao's support.
2025-09-29 20:45:17 -07:00
Lianmin Zheng
dda34c2f93 Fix mem fraction static for nightly tests (#11076) 2025-09-29 12:57:41 -07:00
Lianmin Zheng
a17e70f5cc Use more general heuristics to set the default value of --mem-fraction-static (#10975)
Co-authored-by: sglang-bot <sglangbot@gmail.com>
2025-09-29 10:11:03 -07:00
Zhihao Zhang
24f7cb1ece [speculative decoding] rename lookahead to ngram (#11010)
Co-authored-by: a4zhangfei <a4zhangfei@qq.com>
2025-09-28 21:06:59 -07:00
huangtingwei
e05555fad8 [HiCacheStorage] mooncake store support page_first_direct layout (#10591) 2025-09-28 20:45:48 -07:00
Mick
2e7633982c fix: show failed models in nightly ci (#10986) 2025-09-28 12:38:29 -07:00
Tejesh Anand
8cc27fdc46 Use jsonschema to constrain required or specific tool choice (#10550) 2025-09-27 13:18:50 -04:00
Mick
777eb53897 ci: refactor nightly test (#10495) 2025-09-26 15:24:30 -07:00
Mick
fff7fbabe6 ci: fix rate-limit of huggingface with hf auth login (#10947) 2025-09-26 11:02:44 -07:00
hzh0425
7ec5b4e89c [PD-HiCache]: Support Async Offloading KVCache In Decode Side (#10192)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
2025-09-25 23:20:49 -07:00
eraser00
0ac6114694 Replace the Kimi-K2 generated tool call idx with history tool call count (#10612)
Co-authored-by: eraser00 <eraser00@github.com>
2025-09-25 18:47:40 -07:00
Lianmin Zheng
f68dd998b9 Rename customer label -> custom label (#10899)
Co-authored-by: Yingchun Lai <laiyingchun@apache.org>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-25 16:19:53 -07:00
Lianmin Zheng
35ec2a45a8 [minor] Remove deprecated function get_ip (#10883) 2025-09-25 16:18:04 -07:00