Kaixi Hou
|
5c34b4f1c7
|
[NVIDIA] [2/N] Optimize silu_and_mul_scaled_fp4_grouped_quant perf (#9556)
|
2025-08-29 17:17:03 -07:00 |
|
pansicheng
|
09a1df2231
|
add bench_mix.py (#9788)
|
2025-08-28 23:44:26 -07:00 |
|
Xinyuan Tong
|
f84b57c80e
|
Move git clone command up from README (#9740)
|
2025-08-28 00:27:00 -07:00 |
|
Liangsheng Yin
|
d0934a5192
|
gpt-oss blog reproduction document (#9728)
|
2025-08-28 10:15:08 +08:00 |
|
Yineng Zhang
|
bc80dc4ce0
|
chore: bump v0.5.1.post3 (#9716)
|
2025-08-27 15:42:42 -07:00 |
|
ehuaa
|
8f7b1c31e8
|
Add A100 fused MoE kernel configs for Dpsk (#9677)
|
2025-08-26 20:49:48 -07:00 |
|
hzh0425
|
c04c17edfa
|
refactor(hicache): Introduce generic HiCacheStorageConfig for improved configuration management (#9555)
Co-authored-by: Teng Ma <805522925@qq.com>
|
2025-08-26 17:55:20 -07:00 |
|
Yineng Zhang
|
e3e97a120b
|
chore: bump v0.5.1.post2 (#9592)
|
2025-08-25 03:45:09 -07:00 |
|
Yineng Zhang
|
f8b757bcac
|
fix: resolve tuning fused moe issue (#9587)
|
2025-08-25 01:41:15 -07:00 |
|
Yineng Zhang
|
e0ab167db0
|
chore: bump v0.5.1.post1 (#9558)
|
2025-08-24 01:14:17 -07:00 |
|
Xiaotong Jiang
|
80425e59bb
|
[doc] deepseekv31 support (#9544)
|
2025-08-23 16:54:58 -07:00 |
|
Lianmin Zheng
|
97a38ee85b
|
Release 0.5.1 (#9533)
|
2025-08-23 07:09:26 -07:00 |
|
hzh0425
|
83871aa12d
|
feat(hicache): Supports 3fs-hicache compatibility with dp-attention (#9372)
|
2025-08-23 02:08:32 -07:00 |
|
yuxingcyx
|
4edbe0d534
|
[benchmark] Add benchmark scripts for ceval and boolq (#8946)
Co-authored-by: chenyuxing <2818499974@qq.com>
Co-authored-by: hanqing <huang010706@126.com>
Co-authored-by: Muggle <62579327+trawolf@users.noreply.github.com>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
|
2025-08-23 15:40:15 +08:00 |
|
pansicheng
|
70cf4abccc
|
3fs zerocopy (#9109)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2025-08-22 17:56:38 +08:00 |
|
Even Zhou
|
de2dd73831
|
Revert "[feature] Rework Ascend NPU graph support" (#9385)
|
2025-08-20 00:35:10 -07:00 |
|
Even Zhou
|
3680d6f88b
|
[feature] Rework Ascend NPU graph support (#9350)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: yezhifeng (D) <y00897525@china.huawei.com>
Co-authored-by: anon189Ty <Stari_Falcon@outlook.com>
Co-authored-by: Maksim <makcum888e@mail.ru>
Co-authored-by: ssshinigami <44640852+ssshinigami@users.noreply.github.com>
|
2025-08-19 20:32:27 -07:00 |
|
Chang Su
|
46fe8b8cb2
|
[CI] Fix lint issues (#9361)
|
2025-08-19 13:05:36 -07:00 |
|
mpashkovskiy
|
a3b810ebdb
|
fix: enable multi-GPU Triton fused MoE tuning (#6295)
|
2025-08-19 10:16:58 -07:00 |
|
Even Zhou
|
f4fafacc5d
|
Revert "[feature] Ascend NPU graph support (#8027)" (#9348)
|
2025-08-19 10:11:23 -07:00 |
|
Binyao Jiang
|
c2fbf60f39
|
[GLM4.1V and GLM4.5V] Add vision transformer num_dummy_head support: max tp=4 -> max tp=8 (#9059)
|
2025-08-18 14:40:13 -07:00 |
|
Yuan Luo
|
968e181826
|
Fix triton_fused_moe unit test and benchmark (#9276)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-08-18 00:54:33 -07:00 |
|
VDV1985
|
94371dbbd6
|
[feature] Ascend NPU graph support (#8027)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: yezhifeng (D) <y00897525@china.huawei.com>
Co-authored-by: anon189Ty <Stari_Falcon@outlook.com>
Co-authored-by: Maksim <makcum888e@mail.ru>
Co-authored-by: ssshinigami <44640852+ssshinigami@users.noreply.github.com>
|
2025-08-16 17:25:17 -07:00 |
|
Cheng Wan
|
295895120d
|
[6/N] MoE Refactor: Cleanup MoE-related configs (#8849)
|
2025-08-14 21:14:53 -07:00 |
|
Yineng Zhang
|
fab0f6e77d
|
chore: bump v0.5.0rc2 (#9203)
|
2025-08-14 16:11:16 -07:00 |
|
Sundara Raman Ramachandran
|
a027a9b4b3
|
[Generative Score API] Optimization to Remove Decode. (#8840)
|
2025-08-14 05:12:24 +08:00 |
|
Yineng Zhang
|
7b56e494be
|
chore: bump v0.5.0rc1 (#9069)
|
2025-08-13 10:44:14 -07:00 |
|
Zhiqiang Xie
|
0eec4cb6cc
|
HiCache, add bench long context plus minor fixs (#9086)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-08-11 16:54:52 -07:00 |
|
Lianmin Zheng
|
b58ae7a2a0
|
Simplify frontend language (#9029)
|
2025-08-10 10:59:30 -07:00 |
|
Binyao Jiang
|
f29aba8c6e
|
Support glm4.1v and glm4.5v (#8798)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com>
Co-authored-by: Chang Su <csu272@usc.edu>
|
2025-08-09 00:59:13 -07:00 |
|
Yineng Zhang
|
9020f7fc32
|
chore: bump v0.5.0rc0 (#8959)
|
2025-08-08 09:16:18 -07:00 |
|
pansicheng
|
e2fd2b9c7e
|
Simple prefetch policy (#8692)
|
2025-08-08 02:09:28 -07:00 |
|
eigen
|
9c7e392465
|
bench: add attention sink op benchmark, triton and trtllm-gen [B200] (#8932)
Co-authored-by: averyhuang <averyh@nvidia.com>
|
2025-08-08 00:16:23 -07:00 |
|
Ke Bao
|
0475448ee3
|
Optimize triton swa kernel by skipping computation (#8860)
|
2025-08-06 21:37:50 +08:00 |
|
Yineng Zhang
|
8cd344586e
|
chore: bump v0.4.10.post2 (#8727)
|
2025-08-03 03:43:29 -07:00 |
|
Ke Bao
|
33f0de337d
|
chore: bump v0.4.10.post1 (#8652)
|
2025-08-01 12:07:30 +08:00 |
|
Yineng Zhang
|
023288645b
|
chore: bump v0.4.10 (#8608)
|
2025-07-31 20:50:17 +08:00 |
|
pansicheng
|
299803343d
|
Add hf3fs support for hicache storage (based on #7704) (#7280)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2025-07-30 17:42:41 -07:00 |
|
Yineng Zhang
|
6478831be9
|
chore: bump v0.4.9.post6 (#8517)
|
2025-07-29 02:30:07 -07:00 |
|
Yineng Zhang
|
1466c1b896
|
feat: support glm4 tuning (#8473)
|
2025-07-28 14:32:58 -07:00 |
|
Yineng Zhang
|
45bc170b36
|
chore: bump v0.4.9.post5 (#8458)
|
2025-07-28 02:11:06 -07:00 |
|
Yuxuan Zhang
|
6d6a8bc278
|
GLM-4.5 Model Support (#8224)
Co-authored-by: Lifu Huang <lifu.hlf@gmail.com>
Co-authored-by: Binyao Jiang <byjiang1996@gmail.com>
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
|
2025-07-27 22:54:07 -07:00 |
|
fzyzcjy
|
62222bd27e
|
Minor tool for comparison of benchmark results (#7974)
|
2025-07-27 00:27:50 -07:00 |
|
Mick
|
4fa44d63c6
|
chore: improve mmmu benchmark (#7000)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2025-07-26 16:19:45 +08:00 |
|
Yineng Zhang
|
2272c2a5b5
|
chore: bump v0.4.9.post4 (#8305)
|
2025-07-25 17:12:47 -07:00 |
|
Zhiqiang Xie
|
ce86e201df
|
bug fix and tag (#8282)
|
2025-07-23 16:50:31 +08:00 |
|
Yineng Zhang
|
01c000043c
|
chore: bump v0.4.9.post3 (#8265)
|
2025-07-22 15:55:48 -07:00 |
|
zhongwei
|
ff45ab7a5f
|
[Benchmark] add disable-auto-run param for hicache/bench_multiturn (#7822)
Co-authored-by: zhongwei.ren <zhongwei.ren@bytedance.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2025-07-22 14:02:40 -07:00 |
|
Cheng Wan
|
abda2542d5
|
Fix tuning_fused_moe_triton.py (#8175)
|
2025-07-19 17:33:50 -07:00 |
|
Hongbo Xu
|
1f76fc8747
|
[3/n] chore: decouple AWQ implementation from vLLM dependency (#8113)
Co-authored-by: AniZpZ <zhuangsen.zp@antgroup.com>
|
2025-07-18 11:45:22 -07:00 |
|