Commit Graph

236 Commits

Author SHA1 Message Date
JieXin Liang
d9d35def3d [test] add ut and bm for get_last_loc (#6746) 2025-05-29 11:47:21 -07:00
fzyzcjy
6df81e8a39 Support tuning DeepEP configs (#6742) 2025-05-29 08:12:22 -07:00
ChangyiYang
485a023bd8 refactor apply_w8a8_block_fp8_linear in fp (#6545) 2025-05-29 00:15:11 -07:00
Wenxuan Tan
844a8f42c7 Fix LoRA bench (#6719) 2025-05-28 16:38:55 -07:00
Xiaoyu Zhang
076103535c fix log_info_on_rank0 error when run benchmark (#6260) 2025-05-28 00:20:01 -07:00
Yuan Luo
c087ddd686 Refine pre_reorder_triton_kernel slightly to improve performance (#6627)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2025-05-28 00:15:23 -07:00
Yineng Zhang
7e257cd666 chore: bump v0.4.6.post5 (#6566) 2025-05-24 00:48:05 -07:00
Qiaolin Yu
cd8d4b9dfc Fix lora bench (#6302) 2025-05-15 10:09:55 -07:00
Yineng Zhang
16267d4fa7 chore: bump v0.4.6.post4 (#6245) 2025-05-13 01:57:51 -07:00
fzyzcjy
ef8ec07b2c Support tuning moe for llama 4 model (#6042) 2025-05-12 15:47:01 -07:00
Lianmin Zheng
e8e18dcdcc Revert "fix some typos" (#6244) 2025-05-12 12:53:26 -07:00
applesaucethebun
d738ab52f8 fix some typos (#6209)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-13 01:42:38 +08:00
Lifu Huang
6e2da51561 Replace time.time() to time.perf_counter() for benchmarking. (#6178)
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>
2025-05-11 14:32:49 -07:00
applesaucethebun
2ce8793519 Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-11 12:55:00 +08:00
XinyuanTong
9d8ec2e67e Fix and Clean up chat-template requirement for VLM (#6114)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
2025-05-11 00:14:09 +08:00
Yineng Zhang
678d8cc987 chore: bump v0.4.6.post3 (#6165) 2025-05-09 15:38:47 -07:00
XinyuanTong
6ea1e6ac6e Support MMMU benchmark for InternVL (#5968) 2025-05-02 00:17:21 -07:00
XinyuanTong
c5645e928f feat: add concurrency evaluation logic in mmmu benchmark (#5782) 2025-05-01 18:20:08 -07:00
Yineng Zhang
9858113c33 chore: bump v0.4.6.post2 (#5939) 2025-04-30 22:04:40 -07:00
Yi Zhang
d50e36a79d support vlm benchmark profile (#5905) 2025-04-29 23:48:27 -07:00
Qiaolin Yu
8c0cfca87d Feat: support cuda graph for LoRA (#4115)
Co-authored-by: Beichen Ma <mabeichen12@gmail.com>
2025-04-28 23:30:44 -07:00
Xiaoyu Zhang
1cc326032d simplify fused_moe config logging (#5801) 2025-04-28 17:04:54 -07:00
Yineng Zhang
dcae1fb2cd chore: bump v0.4.6.post1 (#5845) 2025-04-28 12:57:08 -07:00
Yi Zhang
a0251a3fd6 add fused moe config for qwen3moe fp8/bf16 (#5849) 2025-04-28 11:55:52 -07:00
Xiaoyu Zhang
e132cba2a8 fused moe triton tuning script support qwen3 (#5842) 2025-04-28 09:13:04 -07:00
XinyuanTong
0045f4b2af feat: Add fused moe triton config for qwen3 moe on h100 (#5833) 2025-04-28 08:37:13 -07:00
Baizhou Zhang
84022c0e56 Release v0.4.6 (#5795) 2025-04-27 14:07:05 -07:00
Ravi Theja
7d9679b74d Add MMMU benchmark results (#4491)
Co-authored-by: Ravi Theja Desetty <ravitheja@Ravis-MacBook-Pro.local>
2025-04-25 15:23:53 +08:00
Mick
c998d04b46 vlm: enable radix cache for qwen-vl models (#5349)
Co-authored-by: Xinyuan Tong <justinning0323@outlook.com>
2025-04-23 20:35:05 -07:00
Yineng Zhang
b9c87e781d chore: bump v0.4.5.post3 (#5611) 2025-04-21 18:16:20 -07:00
Sundara Raman Ramachandran
f08154193c Perform Batch Tokenization. (#5141) 2025-04-20 18:10:37 -07:00
lukec
417b44eba8 [Feat] upgrade pytorch2.6 (#5417) 2025-04-20 16:06:34 -07:00
Zhaoyi Li
c555d794f7 Minor update for ROCm variable style (#5562) 2025-04-19 23:45:27 -07:00
lambert0312
61e7c4dd21 Add A800 shared experts fused MoE kernel tuning configs for DeepSeek V3/R1 (#5368) 2025-04-14 18:39:44 -07:00
Xiaoyu Zhang
3e4794aad8 refine fused_moe tuning docs (#5294) 2025-04-12 10:01:13 -07:00
Mick
34ef6c8135 [VLM] Adopt fast image processor by default (#5065) 2025-04-11 21:46:58 -07:00
Chunan Zeng
a7c3f74bec [FA3 Feature] Support multi modal Llama-3.2-11B-Vision-Instruct (#5103) 2025-04-07 22:58:08 -07:00
Xiaoyu Zhang
924ca7c92c Add DeepSeek V3/R1 shared experts fusion (#4918) 2025-04-04 01:59:29 -07:00
AniZpZ
d95269f9b3 [2/3] fix dsv3 awq issue (#4625)
Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com>
Co-authored-by: laixinn <xielx@shanghaitech.edu.cn>
2025-04-03 17:36:39 -07:00
Ravi Theja
69df9761dd Add LlavaLlamaForCausaLM in MultiModal Processors (#5039)
Co-authored-by: Ravi Theja Desetty <ravitheja@Ravis-MacBook-Pro.local>
2025-04-03 15:41:12 -07:00
Mick
5cb552b1d4 refactor: multimodal data (#4754) 2025-03-31 09:57:51 -07:00
Brayden Zhong
b149b39353 [CI] Remove unused imports with Ruff to pre-commit config, only to benchmarks/docs/examples folder (#3969) 2025-03-27 19:45:02 -07:00
Daniel Holanda
98a2cfa9b2 Basic Cleanup (#4833) 2025-03-27 16:55:48 -07:00
Ravi Theja
e6e4d02245 Update MMMU Benchmark instructions (#4694) 2025-03-27 14:44:16 -07:00
Chunan Zeng
14269198e3 [Benchmark] tilelang vs deepgemm vs w8a8_block_fp8_matmul (#4735) 2025-03-24 20:56:31 -07:00
Mick
1e86457c90 model: Minicpmo (#3023) 2025-03-24 20:08:40 -07:00
Tongbao Zhang
3980ff1be6 rename benchmark_deepgemm_fp8_group_gemm.py (#4605) 2025-03-23 23:35:20 -07:00
Mick
11577cedb7 refactor: bug fixes and refactor for vlm (#4661) 2025-03-22 22:48:49 -07:00
Ke Bao
8f163b1653 Add EAGLE mtbench benchmark script (#4676)
Co-authored-by: chromecast56 <jamesll@mit.edu>
2025-03-22 13:34:01 -07:00
penguin_wwy
38f25e87fc Correcting default configuration when benchmarking fused_moe (#4665) 2025-03-22 00:52:34 -07:00