Commit Graph

14 Commits

Author SHA1 Message Date
yigex
351a72d40b add dsv3 mi300 triton config for block scale (#3146) 2025-01-27 17:25:53 +08:00
Lianmin Zheng
27acf63bbd Use torch.compile for scaling penalty (#3133) 2025-01-25 18:27:33 -08:00
yiakwy-xpu-ml-framework-team
10bfce71b3 fix moe align blocks benchmark (#3003) 2025-01-20 19:33:29 +08:00
Xiaoyu Zhang
d08c77c434 Sampling penalties memory interface (#2870) 2025-01-13 23:09:00 +08:00
Lianmin Zheng
bdc1acf6cd Misc fix for min_p_sampling, --cuda-graph-bs (#2761) 2025-01-07 02:52:53 -08:00
Xiaoyu Zhang
380930a959 add benchmark_moe_align_blocks (#2767) 2025-01-07 14:20:50 +08:00
HandH1998
afa0341e57 Update Triton configs for block fp8 kernels (#2641) 2024-12-29 22:53:47 +08:00
Yineng Zhang
7863e4368a add configs for block fp8 related kernels (#2628)
Co-authored-by: HandH1998 <1335248067@qq.com>
2024-12-28 23:12:04 +08:00
Xiaoyu Zhang
9a23c48456 h100 tuning fused_moe_triton for qwen2 moe (#2560) 2024-12-26 03:13:31 -08:00
Ke Bao
e835a50021 Reorg moe code (#2563) 2024-12-24 01:10:22 +08:00
Xiaoyu Zhang
3844feb9bb Add a unittest for fused_moe (#2416) 2024-12-08 22:46:10 -08:00
Lianmin Zheng
07ec07ad1f Improve torch compile for fused moe (#2327) 2024-12-03 01:58:25 -08:00
Lianmin Zheng
33deca81b5 Add more fused moe benchmark utilities (#2314) 2024-12-02 04:26:55 -08:00
Xiaoyu Zhang
262e370f78 [benchmark] Add fused_moe_triton benchmark and tuning tools (#2225)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: HAI <hixiao@gmail.com>
2024-11-29 13:36:45 -08:00