Commit Graph

19 Commits

Author SHA1 Message Date
fzyzcjy
ef8ec07b2c Support tuning moe for llama 4 model (#6042) 2025-05-12 15:47:01 -07:00
Lifu Huang
6e2da51561 Replace time.time() to time.perf_counter() for benchmarking. (#6178)
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>
2025-05-11 14:32:49 -07:00
Yi Zhang
a0251a3fd6 add fused moe config for qwen3moe fp8/bf16 (#5849) 2025-04-28 11:55:52 -07:00
XinyuanTong
0045f4b2af feat: Add fused moe triton config for qwen3 moe on h100 (#5833) 2025-04-28 08:37:13 -07:00
Zhaoyi Li
c555d794f7 Minor update for ROCm variable style (#5562) 2025-04-19 23:45:27 -07:00
Xiaoyu Zhang
924ca7c92c Add DeepSeek V3/R1 shared experts fusion (#4918) 2025-04-04 01:59:29 -07:00
penguin_wwy
38f25e87fc Correcting default configuration when benchmarking fused_moe (#4665) 2025-03-22 00:52:34 -07:00
yych0745
6a02b32d07 Add A100 tuning configs for DeepSeek R1/V3 channel-wise INT8 (#4287)
Co-authored-by: HandH1998 <1335248067@qq.com>
2025-03-11 00:49:06 -07:00
Lianmin Zheng
ac2387279e Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
2025-03-03 00:12:04 -08:00
laixin
b0df5d240b Tuning Script for Feature DeepSeek V3/R1 INT8 Quantization (block-wise) (#3922)
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
2025-02-27 10:59:46 +00:00
yigex
ddf39d3fce [ROCm] Optimal MOE Tuning for AMD Radeon Graphics (#3567) 2025-02-17 17:54:10 -08:00
yigex
351a72d40b add dsv3 mi300 triton config for block scale (#3146) 2025-01-27 17:25:53 +08:00
Lianmin Zheng
bdc1acf6cd Misc fix for min_p_sampling, --cuda-graph-bs (#2761) 2025-01-07 02:52:53 -08:00
HandH1998
afa0341e57 Update Triton configs for block fp8 kernels (#2641) 2024-12-29 22:53:47 +08:00
Yineng Zhang
7863e4368a add configs for block fp8 related kernels (#2628)
Co-authored-by: HandH1998 <1335248067@qq.com>
2024-12-28 23:12:04 +08:00
Xiaoyu Zhang
9a23c48456 h100 tuning fused_moe_triton for qwen2 moe (#2560) 2024-12-26 03:13:31 -08:00
Ke Bao
e835a50021 Reorg moe code (#2563) 2024-12-24 01:10:22 +08:00
Lianmin Zheng
33deca81b5 Add more fused moe benchmark utilities (#2314) 2024-12-02 04:26:55 -08:00
Xiaoyu Zhang
262e370f78 [benchmark] Add fused_moe_triton benchmark and tuning tools (#2225)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: HAI <hixiao@gmail.com>
2024-11-29 13:36:45 -08:00