Minglei Zhu
|
46ccbed2cd
|
update GLM nightly test threshold (#10331)
|
2025-09-11 14:54:58 -07:00 |
|
Zaili Wang
|
ef959d7b85
|
[CPU] fix OOM when mem-fraction is not set (#9090)
|
2025-09-10 23:52:22 -07:00 |
|
Even Zhou
|
5b64f006ec
|
[Feature] Support DeepEP normal & Redundant Experts on NPU (#9881)
|
2025-09-10 20:35:26 -07:00 |
|
Xinyuan Tong
|
f3b5db6ee8
|
Feat: support disable tool parser (#10184)
|
2025-09-10 14:03:55 -07:00 |
|
Hubert Lu
|
91b3555d2d
|
Add tests to AMD CI for MI35x (#9662)
Co-authored-by: Sai Enduri <saimanas.enduri@amd.com>
|
2025-09-10 12:50:05 -07:00 |
|
Lifu Huang
|
e903f695c8
|
Fix potential flakiness in test_lora_qwen3 (#10250)
|
2025-09-10 08:04:39 +00:00 |
|
ryang
|
dccf52f9c8
|
[UT for RL] Add UT to cover release/resume memory case for moe model (#8803)
|
2025-09-09 19:25:12 -07:00 |
|
blzheng
|
d1d4074c4e
|
[CPU] Add gelu_and_mul kernel in sgl-kernel and add ut (#9300)
|
2025-09-08 23:23:13 -07:00 |
|
wenhuipeng
|
16ff3d4b05
|
Support opt model (#10165)
|
2025-09-09 12:45:00 +08:00 |
|
Baizhou Zhang
|
8ad700f735
|
Cleaning codes for speculative attention mode (#10149)
|
2025-09-08 17:38:06 -07:00 |
|
LukasBluebaum
|
9a18aa54c2
|
[fix] Relax white space rules in EBNFComposer (#9595)
|
2025-09-08 10:47:19 -07:00 |
|
Liangsheng Yin
|
2c2b19b18b
|
[CI] fix ambiguous argument in testing hybrid attentions. (#10161)
|
2025-09-08 18:16:52 +08:00 |
|
hzh0425
|
ec99668ab7
|
[Hicache]: Add E2E CI For 3FS-KVStore (#10131)
|
2025-09-08 01:54:50 -07:00 |
|
Yineng Zhang
|
b7d1f17b8d
|
Revert "enable auto-round quantization model (#6226)" (#10148)
|
2025-09-07 22:31:11 -07:00 |
|
Weiwei
|
c8295d2353
|
enable auto-round quantization model (#6226)
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
|
2025-09-07 22:05:35 -07:00 |
|
Even Zhou
|
b67c277f86
|
[Bugfix] Qwen3MoE aclrtMemcpy failed with NPUGraph (#10013)
|
2025-09-07 21:50:49 -07:00 |
|
cicirori
|
8c5930f08a
|
Add speculator attention backend switch (#9981)
|
2025-09-07 21:44:36 -07:00 |
|
Cao E
|
7577f0e40f
|
Add graph runner support with torch compile on CPU (#7843)
|
2025-09-07 21:33:58 -07:00 |
|
Qiaolin Yu
|
8cda5a622c
|
Standalone speculative decoding (#10090)
|
2025-09-07 20:55:09 -07:00 |
|
Xinyuan Tong
|
f3440adcb5
|
vlm: enable GLM4.1V server testing & fix video processing (#10095)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Binyao Jiang <byjiang1996@gmail.com>
|
2025-09-08 03:53:08 +01:00 |
|
Shangming Cai
|
00974e4f6e
|
[CI] Refactor disaggregation tests (#10068)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2025-09-06 22:14:46 +08:00 |
|
Cheng Wan
|
21af5c0404
|
[Fix] Compatibility between DP attention and pipeline parallelism (#10100)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-06 01:34:10 -07:00 |
|
Cheng Wan
|
3fa62da78c
|
[7/N] MoE Refactor: the implementation of new framework (#9269)
|
2025-09-05 21:09:09 -07:00 |
|
gongwei-130
|
ab62b135c1
|
support Llama4 with non uniformed intermediate size across layers for… (#10047)
|
2025-09-05 17:28:15 -07:00 |
|
Xinyuan Tong
|
273b28344b
|
[Minor] Refactors KV memory pool (#9842)
|
2025-09-05 17:06:08 -07:00 |
|
DevashishLal-CB
|
13705dae06
|
[Fix] Add speculative_draft_model_revision to server_args (#5255)
Signed-off-by: Devashish Lal <devashish@rivosinc.com>
|
2025-09-05 19:45:46 +08:00 |
|
Liangsheng Yin
|
6e95f5e5bd
|
Simplify Router arguments passing and build it in docker image (#9964)
|
2025-09-05 12:13:55 +08:00 |
|
Yingchun Lai
|
b32ab0705e
|
metrics: support customer buckets for prompt/generation_tokens_histogram (#9634)
|
2025-09-04 22:22:08 +08:00 |
|
hzh0425
|
106c2b31fb
|
feat(hicache): Add generic hicache ci e2e test and benchmark test (#9846)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2025-09-04 20:43:46 +08:00 |
|
Yineng Zhang
|
de9217334b
|
feat: add gpt oss b200 ci (#9988)
|
2025-09-03 17:26:38 -07:00 |
|
Lianmin Zheng
|
60e37f8028
|
Move parsers under a single folder (#9912)
|
2025-09-02 18:25:04 -07:00 |
|
tc-mb
|
03dbf1aa8e
|
[model] support MiniCPM-V 4.0 (#8747)
Signed-off-by: tc-mb <caitianchi@modelbest.cn>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
|
2025-09-02 15:33:03 -07:00 |
|
ybyang
|
5f77e1292d
|
Support Multi Process Tokenizer Manager(#6555) (#8964)
Signed-off-by: ybyang <ybyang7@iflytek.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Co-authored-by: Huang Long <121648372+LLLL114@users.noreply.github.com>
Co-authored-by: huanglong <huanglong@linux.alibaba.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2025-09-01 01:00:13 -07:00 |
|
Baizhou Zhang
|
7de2ce45b2
|
Disable radix cache in test_lora_update.py for better stability (#9852)
|
2025-08-31 22:28:22 -07:00 |
|
narutolhy
|
839c93bd2d
|
feat: add original logprobs to response (#8375)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: luhongyu.4869 <luhongyu.4869@bytedance.com>
|
2025-08-29 11:43:57 -07:00 |
|
gongwei-130
|
3fd1431df2
|
support enable in the reasoning field to enable thingking for thinkin… (#9715)
|
2025-08-29 10:57:32 -07:00 |
|
gongwei-130
|
9a7c8842ba
|
accomendate json schema in the "schema" field, not in "json_schema" field of response_format (#9786)
|
2025-08-28 23:51:50 -07:00 |
|
Hubert Lu
|
711390a971
|
[AMD] Support Hierarchical Caching on AMD GPUs (#8236)
|
2025-08-28 15:27:07 -07:00 |
|
Qiaolin Yu
|
4a4772ae03
|
Support speculative decoding in hybrid attention backend (#9573)
|
2025-08-28 01:11:42 -07:00 |
|
cicirori
|
b6c14ec0b4
|
add response_format support for completion API (#9665)
|
2025-08-26 15:01:29 -07:00 |
|
Xiaotong Jiang
|
0936c766ed
|
Fix kimi k2 function calling format (#9606)
|
2025-08-26 00:50:59 -07:00 |
|
Netanel Haber
|
4cd08dc592
|
model: Support nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 (#9301)
|
2025-08-26 15:33:40 +08:00 |
|
ZhengdQin
|
f92b729d52
|
[new feat] ascend backend support fia fusion kernel (#8328)
Co-authored-by: Even Zhou <even.y.zhou@outlook.com>
|
2025-08-25 23:13:08 -07:00 |
|
Jonas
|
a0a77d937b
|
Fix Harmony reasoning parser for and auto-separation for gpt-oss models (#9190)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>
Co-authored-by: minleminzui <2969413251@qq.com>
Co-authored-by: maocheng23 <maocheng@berkeley.edu>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2025-08-25 15:26:26 -07:00 |
|
Yineng Zhang
|
ebd9dbe71b
|
fix: revert #8593 (#9581)
|
2025-08-25 01:29:06 -07:00 |
|
Pavani Majety
|
3cc3d9b950
|
Add Support for Page Size greater than 1 for Flashinfer MLA Backend (#8593)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-08-21 18:15:06 -07:00 |
|
DiweiSun
|
029e0af31d
|
ci: enhance xeon ci (#9395)
|
2025-08-21 03:35:17 -07:00 |
|
VDV1985
|
2c4b4b786b
|
[feature] Ascend NPU graph support (#9399)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: yezhifeng (D) <y00897525@china.huawei.com>
Co-authored-by: anon189Ty <Stari_Falcon@outlook.com>
Co-authored-by: Maksim <makcum888e@mail.ru>
Co-authored-by: ssshinigami <44640852+ssshinigami@users.noreply.github.com>
|
2025-08-20 21:13:27 -07:00 |
|
Mick
|
ef3004d90a
|
misc: parse bench_serving result as markdown table (#9377)
|
2025-08-20 16:44:20 -07:00 |
|
Lifu Huang
|
b0980af89f
|
Support pinning adapter via server args. (#9249)
|
2025-08-20 16:25:01 -07:00 |
|