Lianmin Zheng
|
d645ae90a3
|
Rename runner labels (#11228)
|
2025-10-05 18:05:41 -07:00 |
|
Xinyuan Tong
|
652c24a653
|
Update transformers package version to 4.57.0 (#11222)
Co-authored-by: yhyang201 <yhyang201@gmail.com>
|
2025-10-05 23:45:14 +00:00 |
|
sunxxuns
|
5e142484e2
|
[Fix AMD CI] VRAM cleanup (#11174)
Co-authored-by: root <root@smci350-zts-gtu-e17-15.zts-gtu.dcgpu>
|
2025-10-05 19:03:53 -04:00 |
|
Shangming Cai
|
c560410da7
|
Refactor and optimize mooncake CI (#11162)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2025-10-05 14:08:52 -07:00 |
|
Vincent Zhong
|
36a6b8dbfc
|
Update v1/responses to be more OpenAI-compatible. (#9624)
|
2025-10-05 18:47:46 +00:00 |
|
Ke Bao
|
31b49c0b51
|
EAGLE cache fix for HiCache (#11215)
|
2025-10-04 16:53:53 -07:00 |
|
Hank Han
|
666da3d59f
|
[fix]enable flashmla when using draft model P/D attention select (#11012)
|
2025-10-04 20:59:34 +08:00 |
|
hzh0425
|
c70e58e837
|
[HICache]: Refactor HiCache CI (#11011)
Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2025-10-03 20:51:56 -04:00 |
|
Liangsheng Yin
|
4726c9197f
|
[minor] fix the lint (#11198)
|
2025-10-04 01:04:58 +08:00 |
|
vikram singh shekhawat
|
586e81a28a
|
[Test] Initialize mem_fraction_static in setUpClass to fix pytest VLM test crashes. (#10859)
Co-authored-by: svc_repro_tool <svc_repro_tool@habana.ai>
|
2025-10-04 00:14:48 +08:00 |
|
shubham singhal
|
03def5e3b1
|
Fix [test]: Env:SGLANG_TORCH_PROFILER_DIR for pytest. (#10780)
|
2025-10-03 22:59:32 +08:00 |
|
fzyzcjy
|
fdc4e1e570
|
Tiny move files to utils folder (#11166)
|
2025-10-03 22:40:06 +08:00 |
|
Matt Nappo
|
8c57490210
|
[Feature] Option to save model weights to CPU when memory saver mode is enabled (#10873)
Co-authored-by: molocule <34072934+molocule@users.noreply.github.com>
|
2025-10-03 16:48:19 +08:00 |
|
fzyzcjy
|
6794d21051
|
Tiny add PD disaggregation + DP attention test (#11167)
|
2025-10-03 14:15:46 +08:00 |
|
Vedant V Jhaveri
|
7e61737d3f
|
[Generative Scores API] add performance tests to CICD (#10830)
|
2025-10-02 19:57:55 -07:00 |
|
Liangsheng Yin
|
7ff740a6ce
|
Remove dp balance metadata and minimul token balance. (#11170)
|
2025-10-03 01:48:15 +08:00 |
|
ilyasch2
|
083629c235
|
[model] Add mamba2 and Falcon-H1 support. (#10988)
Co-authored-by: Younes Belkada <younes.belkada@tii.ae>
Co-authored-by: Younes B <49240599+younesbelkada@users.noreply.github.com>
|
2025-10-02 19:15:36 +08:00 |
|
Liangsheng Yin
|
25e7dbe8af
|
Fix ngram spec with page size > 1 (#11135)
|
2025-10-02 12:34:23 +08:00 |
|
Sai Enduri
|
195a59fe23
|
Refactor AMD CI. (#11128)
|
2025-10-01 01:12:28 -07:00 |
|
Liangsheng Yin
|
73d4a5f879
|
Organize spec-related data structures (#10735)
|
2025-10-01 09:45:30 +08:00 |
|
Ke Bao
|
91847e382a
|
Fix eagle radix cache (#10846)
|
2025-09-30 22:59:20 +08:00 |
|
narutolhy
|
d17986f8c6
|
Enable optional FP32 compute for LM Head (#10729)
Thanks to MiniMax Team and Chenyang Zhao's support.
|
2025-09-29 20:45:17 -07:00 |
|
Lianmin Zheng
|
dda34c2f93
|
Fix mem fraction static for nightly tests (#11076)
|
2025-09-29 12:57:41 -07:00 |
|
Lianmin Zheng
|
a17e70f5cc
|
Use more general heuristics to set the default value of --mem-fraction-static (#10975)
Co-authored-by: sglang-bot <sglangbot@gmail.com>
|
2025-09-29 10:11:03 -07:00 |
|
Zhihao Zhang
|
24f7cb1ece
|
[speculative decoding] rename lookahead to ngram (#11010)
Co-authored-by: a4zhangfei <a4zhangfei@qq.com>
|
2025-09-28 21:06:59 -07:00 |
|
huangtingwei
|
e05555fad8
|
[HiCacheStorage] mooncake store support page_first_direct layout (#10591)
|
2025-09-28 20:45:48 -07:00 |
|
Mick
|
2e7633982c
|
fix: show failed models in nightly ci (#10986)
|
2025-09-28 12:38:29 -07:00 |
|
Tejesh Anand
|
8cc27fdc46
|
Use jsonschema to constrain required or specific tool choice (#10550)
|
2025-09-27 13:18:50 -04:00 |
|
Mick
|
777eb53897
|
ci: refactor nightly test (#10495)
|
2025-09-26 15:24:30 -07:00 |
|
Mick
|
fff7fbabe6
|
ci: fix rate-limit of huggingface with hf auth login (#10947)
|
2025-09-26 11:02:44 -07:00 |
|
hzh0425
|
7ec5b4e89c
|
[PD-HiCache]: Support Async Offloading KVCache In Decode Side (#10192)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2025-09-25 23:20:49 -07:00 |
|
eraser00
|
0ac6114694
|
Replace the Kimi-K2 generated tool call idx with history tool call count (#10612)
Co-authored-by: eraser00 <eraser00@github.com>
|
2025-09-25 18:47:40 -07:00 |
|
Lianmin Zheng
|
f68dd998b9
|
Rename customer label -> custom label (#10899)
Co-authored-by: Yingchun Lai <laiyingchun@apache.org>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-25 16:19:53 -07:00 |
|
Lianmin Zheng
|
35ec2a45a8
|
[minor] Remove deprecated function get_ip (#10883)
|
2025-09-25 16:18:04 -07:00 |
|
kushanam
|
d7b20dd65d
|
chore: Initial support for input config files (#10534)
Co-authored-by: root <root@umbriel-b200-017.ipp4a1.colossus.nvidia.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-09-24 14:45:52 -07:00 |
|
Xinyuan Tong
|
71f24ef8f6
|
feat: add cache_salt support to request (#10718)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2025-09-23 23:30:25 -07:00 |
|
Lianmin Zheng
|
b1f0fc1c0b
|
Add CI timeout guidelines (#10829)
|
2025-09-23 22:08:02 -07:00 |
|
Shangming Cai
|
23632d350c
|
Fix latest main ci (#10799)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2025-09-23 12:46:13 -07:00 |
|
Shangming Cai
|
d21c35224d
|
Fix hicache mooncake backend CI (#10792)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2025-09-23 02:04:44 -07:00 |
|
Even Zhou
|
d27a6f7092
|
[Feature] Add MLAProcess for DeepSeek MLA on NPU (#10130)
|
2025-09-22 17:17:48 -07:00 |
|
Vedant Jhaveri
|
2f555c4cee
|
[Generative Score API] Added test_scores_api.py to github CICD to run per commit (#10755)
Co-authored-by: Vedant Jhaveri <vjhaveri@linkedin.com>
Co-authored-by: Sundara Raman Ramachandran <sundar24295@gmail.com>
|
2025-09-22 14:41:57 -07:00 |
|
Lifu Huang
|
2101d93b4f
|
Fix CI TestChunkedSGMV (#10737)
|
2025-09-22 16:09:58 +08:00 |
|
Shangming Cai
|
70e4b21853
|
Fix flaky logprobs test (#10728)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2025-09-22 00:46:26 -07:00 |
|
Yineng Zhang
|
2f18602f13
|
fix: disable gpt-oss b200 ut (#10716)
|
2025-09-21 17:02:25 -07:00 |
|
Xinyuan Tong
|
12d6cf18f0
|
Refactors radix cache for extra key support (#10317)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2025-09-22 02:16:16 +08:00 |
|
Lifu Huang
|
08ecd0aa2a
|
[3/4] Speed up CSGMV backend perf by 10% through dynamic chunking + kernel optimization (#10592)
|
2025-09-20 22:47:48 -07:00 |
|
Yineng Zhang
|
ba94b82986
|
fix: update run_suite (#10685)
|
2025-09-20 01:22:06 -07:00 |
|
huangtingwei
|
7f399e4bce
|
[HiCacheStorage]support page_first_direct layout for generic set&get (#10522)
|
2025-09-19 05:47:16 -07:00 |
|
Zhihao Zhang
|
e7bc600304
|
[Feature] Speculative decoding support lookahead (#9873)
Co-authored-by: a4zhangfei <a4zhangfei@qq.com>
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
|
2025-09-18 16:42:41 -07:00 |
|
yuk.igalaxy
|
9a5c42f9ad
|
feat: Add FlexAttention Backend for Efficient Sparse Attention (#9947)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-09-18 11:49:17 -07:00 |
|