Commit Graph

1005 Commits

Author SHA1 Message Date
vikram singh shekhawat
586e81a28a [Test] Initialize mem_fraction_static in setUpClass to fix pytest VLM test crashes. (#10859)
Co-authored-by: svc_repro_tool <svc_repro_tool@habana.ai>
2025-10-04 00:14:48 +08:00
shubham singhal
03def5e3b1 Fix [test]: Env:SGLANG_TORCH_PROFILER_DIR for pytest. (#10780) 2025-10-03 22:59:32 +08:00
fzyzcjy
fdc4e1e570 Tiny move files to utils folder (#11166) 2025-10-03 22:40:06 +08:00
Matt Nappo
8c57490210 [Feature] Option to save model weights to CPU when memory saver mode is enabled (#10873)
Co-authored-by: molocule <34072934+molocule@users.noreply.github.com>
2025-10-03 16:48:19 +08:00
fzyzcjy
6794d21051 Tiny add PD disaggregation + DP attention test (#11167) 2025-10-03 14:15:46 +08:00
Vedant V Jhaveri
7e61737d3f [Generative Scores API] add performance tests to CICD (#10830) 2025-10-02 19:57:55 -07:00
Liangsheng Yin
7ff740a6ce Remove dp balance metadata and minimul token balance. (#11170) 2025-10-03 01:48:15 +08:00
ilyasch2
083629c235 [model] Add mamba2 and Falcon-H1 support. (#10988)
Co-authored-by: Younes Belkada <younes.belkada@tii.ae>
Co-authored-by: Younes B <49240599+younesbelkada@users.noreply.github.com>
2025-10-02 19:15:36 +08:00
Liangsheng Yin
25e7dbe8af Fix ngram spec with page size > 1 (#11135) 2025-10-02 12:34:23 +08:00
Sai Enduri
195a59fe23 Refactor AMD CI. (#11128) 2025-10-01 01:12:28 -07:00
Liangsheng Yin
73d4a5f879 Organize spec-related data structures (#10735) 2025-10-01 09:45:30 +08:00
Ke Bao
91847e382a Fix eagle radix cache (#10846) 2025-09-30 22:59:20 +08:00
narutolhy
d17986f8c6 Enable optional FP32 compute for LM Head (#10729)
Thanks to MiniMax Team and Chenyang Zhao's support.
2025-09-29 20:45:17 -07:00
Lianmin Zheng
dda34c2f93 Fix mem fraction static for nightly tests (#11076) 2025-09-29 12:57:41 -07:00
Lianmin Zheng
a17e70f5cc Use more general heuristics to set the default value of --mem-fraction-static (#10975)
Co-authored-by: sglang-bot <sglangbot@gmail.com>
2025-09-29 10:11:03 -07:00
Zhihao Zhang
24f7cb1ece [speculative decoding] rename lookahead to ngram (#11010)
Co-authored-by: a4zhangfei <a4zhangfei@qq.com>
2025-09-28 21:06:59 -07:00
huangtingwei
e05555fad8 [HiCacheStorage] mooncake store support page_first_direct layout (#10591) 2025-09-28 20:45:48 -07:00
Mick
2e7633982c fix: show failed models in nightly ci (#10986) 2025-09-28 12:38:29 -07:00
Tejesh Anand
8cc27fdc46 Use jsonschema to constrain required or specific tool choice (#10550) 2025-09-27 13:18:50 -04:00
Mick
777eb53897 ci: refactor nightly test (#10495) 2025-09-26 15:24:30 -07:00
Mick
fff7fbabe6 ci: fix rate-limit of huggingface with hf auth login (#10947) 2025-09-26 11:02:44 -07:00
hzh0425
7ec5b4e89c [PD-HiCache]: Support Async Offloading KVCache In Decode Side (#10192)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
2025-09-25 23:20:49 -07:00
eraser00
0ac6114694 Replace the Kimi-K2 generated tool call idx with history tool call count (#10612)
Co-authored-by: eraser00 <eraser00@github.com>
2025-09-25 18:47:40 -07:00
Lianmin Zheng
f68dd998b9 Rename customer label -> custom label (#10899)
Co-authored-by: Yingchun Lai <laiyingchun@apache.org>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-25 16:19:53 -07:00
Lianmin Zheng
35ec2a45a8 [minor] Remove deprecated function get_ip (#10883) 2025-09-25 16:18:04 -07:00
kushanam
d7b20dd65d chore: Initial support for input config files (#10534)
Co-authored-by: root <root@umbriel-b200-017.ipp4a1.colossus.nvidia.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-09-24 14:45:52 -07:00
Xinyuan Tong
71f24ef8f6 feat: add cache_salt support to request (#10718)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2025-09-23 23:30:25 -07:00
Lianmin Zheng
b1f0fc1c0b Add CI timeout guidelines (#10829) 2025-09-23 22:08:02 -07:00
Shangming Cai
23632d350c Fix latest main ci (#10799)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2025-09-23 12:46:13 -07:00
Shangming Cai
d21c35224d Fix hicache mooncake backend CI (#10792)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2025-09-23 02:04:44 -07:00
Even Zhou
d27a6f7092 [Feature] Add MLAProcess for DeepSeek MLA on NPU (#10130) 2025-09-22 17:17:48 -07:00
Vedant Jhaveri
2f555c4cee [Generative Score API] Added test_scores_api.py to github CICD to run per commit (#10755)
Co-authored-by: Vedant Jhaveri <vjhaveri@linkedin.com>
Co-authored-by: Sundara Raman Ramachandran <sundar24295@gmail.com>
2025-09-22 14:41:57 -07:00
Lifu Huang
2101d93b4f Fix CI TestChunkedSGMV (#10737) 2025-09-22 16:09:58 +08:00
Shangming Cai
70e4b21853 Fix flaky logprobs test (#10728)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2025-09-22 00:46:26 -07:00
Yineng Zhang
2f18602f13 fix: disable gpt-oss b200 ut (#10716) 2025-09-21 17:02:25 -07:00
Xinyuan Tong
12d6cf18f0 Refactors radix cache for extra key support (#10317)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2025-09-22 02:16:16 +08:00
Lifu Huang
08ecd0aa2a [3/4] Speed up CSGMV backend perf by 10% through dynamic chunking + kernel optimization (#10592) 2025-09-20 22:47:48 -07:00
Yineng Zhang
ba94b82986 fix: update run_suite (#10685) 2025-09-20 01:22:06 -07:00
huangtingwei
7f399e4bce [HiCacheStorage]support page_first_direct layout for generic set&get (#10522) 2025-09-19 05:47:16 -07:00
Zhihao Zhang
e7bc600304 [Feature] Speculative decoding support lookahead (#9873)
Co-authored-by: a4zhangfei <a4zhangfei@qq.com>
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
2025-09-18 16:42:41 -07:00
yuk.igalaxy
9a5c42f9ad feat: Add FlexAttention Backend for Efficient Sparse Attention (#9947)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2025-09-18 11:49:17 -07:00
penguin_wwy
93f75778be [RL] Add destroy process group api (#9979) 2025-09-19 00:31:56 +08:00
Yineng Zhang
564050766d fix: update dsv3 fp4 ut (#10584) 2025-09-17 14:34:58 -07:00
Teng Ma
77098aea7b [HiCache] Add tests for hicache storage mooncake backend (#10171)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: hzh0425 <hzh0425@apache.org>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
2025-09-18 01:07:16 +08:00
harrisonlimh
14fdd52740 feat: add priority based scheduling with priority based request acceptance and preemption (#8746) 2025-09-16 17:10:10 -07:00
Night
f1c692f6f8 Add Logprobs unit test with a loose threshold (#10230)
Co-authored-by: Yusheng Su <yushengsu.thu@gmail.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Ryan <ryan@ryanmini.mynetworksettings.com>
2025-09-16 13:04:40 +08:00
Lifu Huang
3f41b48c40 [2/2] Introduce Chunked-SGMV kernels and corresponding LoRA backend for improved performance (#10286) 2025-09-15 16:04:03 -07:00
fzyzcjy
3b25dc127a [1/2] Speed up trtllm_mla attention backend (>10% e2e) (#10473) 2025-09-15 11:53:21 -07:00
Praneth Paruchuri
a45d9a4ee8 model: support solar (#8189) 2025-09-16 02:21:13 +08:00
Lianmin Zheng
50dc0c1e9c Run tests based on labels (#10456) 2025-09-15 00:29:20 -07:00