Even Zhou
|
d27a6f7092
|
[Feature] Add MLAProcess for DeepSeek MLA on NPU (#10130)
|
2025-09-22 17:17:48 -07:00 |
|
Vedant Jhaveri
|
2f555c4cee
|
[Generative Score API] Added test_scores_api.py to github CICD to run per commit (#10755)
Co-authored-by: Vedant Jhaveri <vjhaveri@linkedin.com>
Co-authored-by: Sundara Raman Ramachandran <sundar24295@gmail.com>
|
2025-09-22 14:41:57 -07:00 |
|
Lifu Huang
|
2101d93b4f
|
Fix CI TestChunkedSGMV (#10737)
|
2025-09-22 16:09:58 +08:00 |
|
Shangming Cai
|
70e4b21853
|
Fix flaky logprobs test (#10728)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2025-09-22 00:46:26 -07:00 |
|
Yineng Zhang
|
2f18602f13
|
fix: disable gpt-oss b200 ut (#10716)
|
2025-09-21 17:02:25 -07:00 |
|
Xinyuan Tong
|
12d6cf18f0
|
Refactors radix cache for extra key support (#10317)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2025-09-22 02:16:16 +08:00 |
|
Lifu Huang
|
08ecd0aa2a
|
[3/4] Speed up CSGMV backend perf by 10% through dynamic chunking + kernel optimization (#10592)
|
2025-09-20 22:47:48 -07:00 |
|
Yineng Zhang
|
ba94b82986
|
fix: update run_suite (#10685)
|
2025-09-20 01:22:06 -07:00 |
|
huangtingwei
|
7f399e4bce
|
[HiCacheStorage]support page_first_direct layout for generic set&get (#10522)
|
2025-09-19 05:47:16 -07:00 |
|
Zhihao Zhang
|
e7bc600304
|
[Feature] Speculative decoding support lookahead (#9873)
Co-authored-by: a4zhangfei <a4zhangfei@qq.com>
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
|
2025-09-18 16:42:41 -07:00 |
|
yuk.igalaxy
|
9a5c42f9ad
|
feat: Add FlexAttention Backend for Efficient Sparse Attention (#9947)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-09-18 11:49:17 -07:00 |
|
penguin_wwy
|
93f75778be
|
[RL] Add destroy process group api (#9979)
|
2025-09-19 00:31:56 +08:00 |
|
Yineng Zhang
|
564050766d
|
fix: update dsv3 fp4 ut (#10584)
|
2025-09-17 14:34:58 -07:00 |
|
Teng Ma
|
77098aea7b
|
[HiCache] Add tests for hicache storage mooncake backend (#10171)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: hzh0425 <hzh0425@apache.org>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2025-09-18 01:07:16 +08:00 |
|
harrisonlimh
|
14fdd52740
|
feat: add priority based scheduling with priority based request acceptance and preemption (#8746)
|
2025-09-16 17:10:10 -07:00 |
|
Night
|
f1c692f6f8
|
Add Logprobs unit test with a loose threshold (#10230)
Co-authored-by: Yusheng Su <yushengsu.thu@gmail.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Ryan <ryan@ryanmini.mynetworksettings.com>
|
2025-09-16 13:04:40 +08:00 |
|
Lifu Huang
|
3f41b48c40
|
[2/2] Introduce Chunked-SGMV kernels and corresponding LoRA backend for improved performance (#10286)
|
2025-09-15 16:04:03 -07:00 |
|
fzyzcjy
|
3b25dc127a
|
[1/2] Speed up trtllm_mla attention backend (>10% e2e) (#10473)
|
2025-09-15 11:53:21 -07:00 |
|
Praneth Paruchuri
|
a45d9a4ee8
|
model: support solar (#8189)
|
2025-09-16 02:21:13 +08:00 |
|
Lianmin Zheng
|
50dc0c1e9c
|
Run tests based on labels (#10456)
|
2025-09-15 00:29:20 -07:00 |
|
Jintao Zhang
|
f9ee6ae17a
|
[router]: Add Embedding routing logic (#10129)
Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>
Co-authored-by: Waël Boukhobza <wawa_wael@live.fr>
|
2025-09-14 18:44:35 -07:00 |
|
Yineng Zhang
|
dcee42c200
|
feat: add dsv3 fp4 cutlass moe etp ut (#10433)
|
2025-09-14 18:44:09 -07:00 |
|
Cheng Wan
|
2f8ba6fe82
|
[Fix] MoE: fix w8a8_fp8 MoE and add tests to cover this code path (#10429)
|
2025-09-14 17:34:28 -07:00 |
|
Feng Su
|
4c21b09074
|
[Feature] Sglang Tracing: Fine-Grained Tracking for Request Latency - Part 1 (#9962)
Signed-off-by: Feng Su <sufeng@linux.alibaba.com>
Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com>
Signed-off-by: Peng Wang <rocking@linux.alibaba.com>
|
2025-09-15 02:08:02 +08:00 |
|
Sundara Raman Ramachandran
|
94d0f656fb
|
[Performance] Dynamic Batch Tokenizer (#9382)
|
2025-09-14 01:56:04 +08:00 |
|
Yineng Zhang
|
9d775b1a2d
|
feat: add deepseek v3 fp4 ut (#10391)
|
2025-09-12 15:43:29 -07:00 |
|
Yi Zhang
|
fe6cdf8972
|
add qwen3-next ut (#10355)
|
2025-09-12 18:06:48 +08:00 |
|
amysaq2023
|
30d20ce84f
|
Support loading weights from remote instance (#8215)
Signed-off-by: Anqi Shen <amy.saq@antgroup.com>
Co-authored-by: Chayenne <74843776+zhaochenyang20@users.noreply.github.com>
|
2025-09-12 17:40:22 +08:00 |
|
EduardDurech
|
46d8fb1c98
|
model: support Apertus (#9774)
|
2025-09-11 20:49:10 -07:00 |
|
Shu Wang
|
3df05f4d6a
|
[NVIDIA] [3/N] Nvfp4 Masked Gemm: Add flashinfer grouped_gemm_nt_masked (#9199)
|
2025-09-11 20:18:43 -07:00 |
|
Minglei Zhu
|
46ccbed2cd
|
update GLM nightly test threshold (#10331)
|
2025-09-11 14:54:58 -07:00 |
|
Zaili Wang
|
ef959d7b85
|
[CPU] fix OOM when mem-fraction is not set (#9090)
|
2025-09-10 23:52:22 -07:00 |
|
Even Zhou
|
5b64f006ec
|
[Feature] Support DeepEP normal & Redundant Experts on NPU (#9881)
|
2025-09-10 20:35:26 -07:00 |
|
Xinyuan Tong
|
f3b5db6ee8
|
Feat: support disable tool parser (#10184)
|
2025-09-10 14:03:55 -07:00 |
|
Hubert Lu
|
91b3555d2d
|
Add tests to AMD CI for MI35x (#9662)
Co-authored-by: Sai Enduri <saimanas.enduri@amd.com>
|
2025-09-10 12:50:05 -07:00 |
|
Lifu Huang
|
e903f695c8
|
Fix potential flakiness in test_lora_qwen3 (#10250)
|
2025-09-10 08:04:39 +00:00 |
|
ryang
|
dccf52f9c8
|
[UT for RL] Add UT to cover release/resume memory case for moe model (#8803)
|
2025-09-09 19:25:12 -07:00 |
|
blzheng
|
d1d4074c4e
|
[CPU] Add gelu_and_mul kernel in sgl-kernel and add ut (#9300)
|
2025-09-08 23:23:13 -07:00 |
|
wenhuipeng
|
16ff3d4b05
|
Support opt model (#10165)
|
2025-09-09 12:45:00 +08:00 |
|
Baizhou Zhang
|
8ad700f735
|
Cleaning codes for speculative attention mode (#10149)
|
2025-09-08 17:38:06 -07:00 |
|
LukasBluebaum
|
9a18aa54c2
|
[fix] Relax white space rules in EBNFComposer (#9595)
|
2025-09-08 10:47:19 -07:00 |
|
Liangsheng Yin
|
2c2b19b18b
|
[CI] fix ambiguous argument in testing hybrid attentions. (#10161)
|
2025-09-08 18:16:52 +08:00 |
|
hzh0425
|
ec99668ab7
|
[Hicache]: Add E2E CI For 3FS-KVStore (#10131)
|
2025-09-08 01:54:50 -07:00 |
|
Yineng Zhang
|
b7d1f17b8d
|
Revert "enable auto-round quantization model (#6226)" (#10148)
|
2025-09-07 22:31:11 -07:00 |
|
Weiwei
|
c8295d2353
|
enable auto-round quantization model (#6226)
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
|
2025-09-07 22:05:35 -07:00 |
|
Even Zhou
|
b67c277f86
|
[Bugfix] Qwen3MoE aclrtMemcpy failed with NPUGraph (#10013)
|
2025-09-07 21:50:49 -07:00 |
|
cicirori
|
8c5930f08a
|
Add speculator attention backend switch (#9981)
|
2025-09-07 21:44:36 -07:00 |
|
Cao E
|
7577f0e40f
|
Add graph runner support with torch compile on CPU (#7843)
|
2025-09-07 21:33:58 -07:00 |
|
Qiaolin Yu
|
8cda5a622c
|
Standalone speculative decoding (#10090)
|
2025-09-07 20:55:09 -07:00 |
|
Xinyuan Tong
|
f3440adcb5
|
vlm: enable GLM4.1V server testing & fix video processing (#10095)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Binyao Jiang <byjiang1996@gmail.com>
|
2025-09-08 03:53:08 +01:00 |
|