Commit Graph

266 Commits

Author SHA1 Message Date
Mick
4fa44d63c6 chore: improve mmmu benchmark (#7000)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2025-07-26 16:19:45 +08:00
Yineng Zhang
2272c2a5b5 chore: bump v0.4.9.post4 (#8305) 2025-07-25 17:12:47 -07:00
Zhiqiang Xie
ce86e201df bug fix and tag (#8282) 2025-07-23 16:50:31 +08:00
Yineng Zhang
01c000043c chore: bump v0.4.9.post3 (#8265) 2025-07-22 15:55:48 -07:00
zhongwei
ff45ab7a5f [Benchmark] add disable-auto-run param for hicache/bench_multiturn (#7822)
Co-authored-by: zhongwei.ren <zhongwei.ren@bytedance.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-07-22 14:02:40 -07:00
Cheng Wan
abda2542d5 Fix tuning_fused_moe_triton.py (#8175) 2025-07-19 17:33:50 -07:00
Hongbo Xu
1f76fc8747 [3/n] chore: decouple AWQ implementation from vLLM dependency (#8113)
Co-authored-by: AniZpZ <zhuangsen.zp@antgroup.com>
2025-07-18 11:45:22 -07:00
Yineng Zhang
eb118d88c4 chore: bump v0.4.9.post2 (#7963) 2025-07-11 21:11:20 -07:00
Yineng Zhang
066f4ec91f chore: bump v0.4.9.post1 (#7882) 2025-07-09 00:28:17 -07:00
Yuan Luo
253454de9b Integrate triton moe kernel (#7689)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2025-07-06 20:05:49 -07:00
Yineng Zhang
ec5f9c6269 chore: bump v0.4.9 (#7802) 2025-07-05 17:40:29 -07:00
Yineng Zhang
69183f8808 chore: bump v0.4.8.post1 (#7559) 2025-06-26 02:21:12 -07:00
Xiaoyu Zhang
8ecad0b16f [benchmark] fbgemm benchmark support bandwidth report and support fbgemm_cutlass_gmm (#7422) 2025-06-24 09:44:55 -07:00
Yineng Zhang
7c3a12c000 chore: bump v0.4.8 (#7493) 2025-06-23 23:14:22 -07:00
Chang Su
72676cd6c0 feat(oai refactor): Replace openai_api with entrypoints/openai (#7351)
Co-authored-by: Jin Pan <jpan236@wisc.edu>
2025-06-21 13:21:06 -07:00
Binyao Jiang
b783c1cb82 Fix hicache benchmark script bug - some sampled input_request is [] (#7300) 2025-06-17 23:47:11 -07:00
Zhiqiang Xie
e56685ac1b Upstreaming hicache bug fixes (#7267) 2025-06-17 17:44:57 -07:00
Yineng Zhang
f9dc9dd28b chore: bump v0.4.7.post1 (#7248) 2025-06-16 15:20:29 -07:00
Xiaoyu Zhang
0ae1e9a755 refine fused_moe benchmark (#7221) 2025-06-15 21:21:32 -07:00
Lifu Huang
e07d064729 Support LoRA in MMMU benchmark script. (#7218) 2025-06-15 21:17:57 -07:00
Quanfeng Li
ef32677444 Fix positional argument (#7093) 2025-06-11 18:31:13 -07:00
Yineng Zhang
4f723edd3b chore: bump v0.4.7 (#7038) 2025-06-10 01:56:20 -07:00
Xiaoyu Zhang
3712abfaf9 Fuse routed scaling factor in deepseek (#6970) 2025-06-08 15:24:24 -07:00
Xiaoyu Zhang
fa3592cfeb rebase h20 fused_moe config (#6966) 2025-06-08 05:01:34 -07:00
Yineng Zhang
1fb76ebb93 Revert "Fuse routed scaling factor in topk_reduce kernel (#6220)" (#6968) 2025-06-07 21:02:49 -07:00
Xiaoyu Zhang
515ef4facb Fuse routed scaling factor in topk_reduce kernel (#6220) 2025-06-07 11:06:50 -07:00
Xiaoyu Zhang
bae4fdc7ab add fbgemm moe grouped gemm kernel benchmark (#6924) 2025-06-07 02:57:30 -07:00
zyksir
8e3797be1c support 1 shot allreduce in 1-node and 2-node using mscclpp (#6277) 2025-06-04 22:11:24 -07:00
Cheng Wan
81964328b7 Set num_fused_shared_experts as num_shared_experts when shared_experts fusion is not disabled (#6736) 2025-06-04 15:53:22 -07:00
Cheng Wan
8a5480528d [Refactor] Rename n_share_experts_fusion as num_fused_shared_experts (#6735) 2025-06-03 17:48:24 -07:00
JieXin Liang
d9d35def3d [test] add ut and bm for get_last_loc (#6746) 2025-05-29 11:47:21 -07:00
fzyzcjy
6df81e8a39 Support tuning DeepEP configs (#6742) 2025-05-29 08:12:22 -07:00
ChangyiYang
485a023bd8 refactor apply_w8a8_block_fp8_linear in fp (#6545) 2025-05-29 00:15:11 -07:00
Wenxuan Tan
844a8f42c7 Fix LoRA bench (#6719) 2025-05-28 16:38:55 -07:00
Xiaoyu Zhang
076103535c fix log_info_on_rank0 error when run benchmark (#6260) 2025-05-28 00:20:01 -07:00
Yuan Luo
c087ddd686 Refine pre_reorder_triton_kernel slightly to improve performance (#6627)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2025-05-28 00:15:23 -07:00
Yineng Zhang
7e257cd666 chore: bump v0.4.6.post5 (#6566) 2025-05-24 00:48:05 -07:00
Qiaolin Yu
cd8d4b9dfc Fix lora bench (#6302) 2025-05-15 10:09:55 -07:00
Yineng Zhang
16267d4fa7 chore: bump v0.4.6.post4 (#6245) 2025-05-13 01:57:51 -07:00
fzyzcjy
ef8ec07b2c Support tuning moe for llama 4 model (#6042) 2025-05-12 15:47:01 -07:00
Lianmin Zheng
e8e18dcdcc Revert "fix some typos" (#6244) 2025-05-12 12:53:26 -07:00
applesaucethebun
d738ab52f8 fix some typos (#6209)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-13 01:42:38 +08:00
Lifu Huang
6e2da51561 Replace time.time() to time.perf_counter() for benchmarking. (#6178)
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>
2025-05-11 14:32:49 -07:00
applesaucethebun
2ce8793519 Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-11 12:55:00 +08:00
XinyuanTong
9d8ec2e67e Fix and Clean up chat-template requirement for VLM (#6114)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
2025-05-11 00:14:09 +08:00
Yineng Zhang
678d8cc987 chore: bump v0.4.6.post3 (#6165) 2025-05-09 15:38:47 -07:00
XinyuanTong
6ea1e6ac6e Support MMMU benchmark for InternVL (#5968) 2025-05-02 00:17:21 -07:00
XinyuanTong
c5645e928f feat: add concurrency evaluation logic in mmmu benchmark (#5782) 2025-05-01 18:20:08 -07:00
Yineng Zhang
9858113c33 chore: bump v0.4.6.post2 (#5939) 2025-04-30 22:04:40 -07:00
Yi Zhang
d50e36a79d support vlm benchmark profile (#5905) 2025-04-29 23:48:27 -07:00