Commit Graph

335 Commits

Author SHA1 Message Date
Yuan Luo
42245551ef [sgl-kernel] Optimize concat_mla_k kernel (#10543)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: PGFLMG <1106310035@qq.com>
2025-09-28 23:04:22 +08:00
lukec
77830a265e Add fuse_moe per-channel tune (#10915) 2025-09-25 21:12:09 +08:00
Xiaoyu Zhang
c4e314f986 Restruct sgl-kernel benchmark (#10861) 2025-09-25 07:45:25 +08:00
Yiakwy
984730b732 add tunning files for QWEN-3-NEXT (#10794) 2025-09-23 12:46:30 -07:00
ZhengHSI
adc24a3a0c fix ceval (#10504)
Co-authored-by: lukec <118525388+sleepcoo@users.noreply.github.com>
2025-09-24 02:35:25 +08:00
Yuan Luo
616a3e20df [sgl-kernel] Support moe_sum_reduce cuda kernel (#10321)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
2025-09-19 14:12:09 +08:00
zhannngchen
7a68b4225a [improvement] add average input/output token length for hicache benchmark stats output (#10525) 2025-09-18 00:38:03 -07:00
zhannngchen
541551cefe [bugfix]hicache bench_long_context.py run failed (#10523) 2025-09-17 11:27:06 +08:00
ykwd
4bb08f6e07 [Hicache] Evaluate Per-Round Metrics in Multiturn Bench (#10203)
Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com>
2025-09-15 19:34:40 -07:00
Yineng Zhang
86a32bb5cd chore: bump v0.5.3rc0 (#10468) 2025-09-15 03:55:18 -07:00
hzh0425
2a37b24d23 [HotFix]: Hot fix import path in 3fs_bench_client.py (#10463) 2025-09-14 23:45:46 -07:00
Vincent Zhong
1489cd6c02 [docs / oneliner] update mmmu docs instruction (#9768)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-15 11:26:39 +08:00
chenge@xiaohongshu.com
1b1701f1f7 model: support dots.vlm1 model (#8778)
Co-authored-by: weishi <bushou@xiaohongshu.com>
Co-authored-by: Ezra-Yu <1105212286@qq.com>
Co-authored-by: Jianfei Wang <905787410@qq.com>
Co-authored-by: qianwu <wangjianfei@xiaohongshu.com>
2025-09-12 17:38:38 +08:00
strgrb
fac07c9b08 Support LingV2 model (#10359)
Co-authored-by: 羽癫 <yudian.zy@antgroup.com>
Co-authored-by: guoyuhong <yuhong.gyh@antgroup.com>
2025-09-11 23:53:52 -07:00
Yineng Zhang
b0d25e72c4 chore: bump v0.5.2 (#10221) 2025-09-11 16:09:20 -07:00
Sundara Raman Ramachandran
a1d038924b [Benchmark] Prefil-only benchmark scripts (#10240) 2025-09-10 10:59:07 +08:00
Yuan Luo
cb3918a091 Optimize moe_sum_reduce_kernel (#9477)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
2025-09-07 09:16:18 +08:00
Baizhou Zhang
beac202bfd Add lora_path argument to bench_multiturn.py (#10092) 2025-09-05 19:20:42 -07:00
DevashishLal-CB
13705dae06 [Fix] Add speculative_draft_model_revision to server_args (#5255)
Signed-off-by: Devashish Lal <devashish@rivosinc.com>
2025-09-05 19:45:46 +08:00
Yineng Zhang
fa9c82d339 chore: bump v0.5.2rc2 (#10050) 2025-09-04 20:07:27 -07:00
Xiaoyu Zhang
b1fb7e458c [benchmark] add flashinfer_allreduce_fusion benchmark (#9937) 2025-09-03 16:31:01 +08:00
Yineng Zhang
18f91eb639 chore: bump v0.5.2rc1 (#9920) 2025-09-02 04:43:34 -07:00
Lifu Huang
1fbfdebe6b [chore] fix dead links in doc (#9913) 2025-09-02 00:28:26 -07:00
Yineng Zhang
16e56ea693 chore: bump v0.5.2rc0 (#9862) 2025-09-01 03:07:36 -07:00
hzh0425
8c2ffaaf0f fix(hicahce-long-bench): adjust context workload generator to use full query set (#9847)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-08-31 14:51:18 -07:00
Pawel Kowalski
20445327b2 fix inconsistent arguments for generated shared prefix bench (#9073)
Co-authored-by: Pawel Kowalski <pawel.kowalski@silo.ai>
2025-08-31 14:27:33 -07:00
Kaixi Hou
5c34b4f1c7 [NVIDIA] [2/N] Optimize silu_and_mul_scaled_fp4_grouped_quant perf (#9556) 2025-08-29 17:17:03 -07:00
pansicheng
09a1df2231 add bench_mix.py (#9788) 2025-08-28 23:44:26 -07:00
Xinyuan Tong
f84b57c80e Move git clone command up from README (#9740) 2025-08-28 00:27:00 -07:00
Liangsheng Yin
d0934a5192 gpt-oss blog reproduction document (#9728) 2025-08-28 10:15:08 +08:00
Yineng Zhang
bc80dc4ce0 chore: bump v0.5.1.post3 (#9716) 2025-08-27 15:42:42 -07:00
ehuaa
8f7b1c31e8 Add A100 fused MoE kernel configs for Dpsk (#9677) 2025-08-26 20:49:48 -07:00
hzh0425
c04c17edfa refactor(hicache): Introduce generic HiCacheStorageConfig for improved configuration management (#9555)
Co-authored-by: Teng Ma <805522925@qq.com>
2025-08-26 17:55:20 -07:00
Yineng Zhang
e3e97a120b chore: bump v0.5.1.post2 (#9592) 2025-08-25 03:45:09 -07:00
Yineng Zhang
f8b757bcac fix: resolve tuning fused moe issue (#9587) 2025-08-25 01:41:15 -07:00
Yineng Zhang
e0ab167db0 chore: bump v0.5.1.post1 (#9558) 2025-08-24 01:14:17 -07:00
Xiaotong Jiang
80425e59bb [doc] deepseekv31 support (#9544) 2025-08-23 16:54:58 -07:00
Lianmin Zheng
97a38ee85b Release 0.5.1 (#9533) 2025-08-23 07:09:26 -07:00
hzh0425
83871aa12d feat(hicache): Supports 3fs-hicache compatibility with dp-attention (#9372) 2025-08-23 02:08:32 -07:00
yuxingcyx
4edbe0d534 [benchmark] Add benchmark scripts for ceval and boolq (#8946)
Co-authored-by: chenyuxing <2818499974@qq.com>
Co-authored-by: hanqing <huang010706@126.com>
Co-authored-by: Muggle <62579327+trawolf@users.noreply.github.com>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
2025-08-23 15:40:15 +08:00
pansicheng
70cf4abccc 3fs zerocopy (#9109)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-08-22 17:56:38 +08:00
Even Zhou
de2dd73831 Revert "[feature] Rework Ascend NPU graph support" (#9385) 2025-08-20 00:35:10 -07:00
Even Zhou
3680d6f88b [feature] Rework Ascend NPU graph support (#9350)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: yezhifeng (D) <y00897525@china.huawei.com>
Co-authored-by: anon189Ty <Stari_Falcon@outlook.com>
Co-authored-by: Maksim <makcum888e@mail.ru>
Co-authored-by: ssshinigami <44640852+ssshinigami@users.noreply.github.com>
2025-08-19 20:32:27 -07:00
Chang Su
46fe8b8cb2 [CI] Fix lint issues (#9361) 2025-08-19 13:05:36 -07:00
mpashkovskiy
a3b810ebdb fix: enable multi-GPU Triton fused MoE tuning (#6295) 2025-08-19 10:16:58 -07:00
Even Zhou
f4fafacc5d Revert "[feature] Ascend NPU graph support (#8027)" (#9348) 2025-08-19 10:11:23 -07:00
Binyao Jiang
c2fbf60f39 [GLM4.1V and GLM4.5V] Add vision transformer num_dummy_head support: max tp=4 -> max tp=8 (#9059) 2025-08-18 14:40:13 -07:00
Yuan Luo
968e181826 Fix triton_fused_moe unit test and benchmark (#9276)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2025-08-18 00:54:33 -07:00
VDV1985
94371dbbd6 [feature] Ascend NPU graph support (#8027)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: yezhifeng (D) <y00897525@china.huawei.com>
Co-authored-by: anon189Ty <Stari_Falcon@outlook.com>
Co-authored-by: Maksim <makcum888e@mail.ru>
Co-authored-by: ssshinigami <44640852+ssshinigami@users.noreply.github.com>
2025-08-16 17:25:17 -07:00
Cheng Wan
295895120d [6/N] MoE Refactor: Cleanup MoE-related configs (#8849) 2025-08-14 21:14:53 -07:00