Commit Graph

65 Commits

Author SHA1 Message Date
ehuaa
8f7b1c31e8 Add A100 fused MoE kernel configs for Dpsk (#9677) 2025-08-26 20:49:48 -07:00
Yineng Zhang
f8b757bcac fix: resolve tuning fused moe issue (#9587) 2025-08-25 01:41:15 -07:00
Even Zhou
de2dd73831 Revert "[feature] Rework Ascend NPU graph support" (#9385) 2025-08-20 00:35:10 -07:00
Even Zhou
3680d6f88b [feature] Rework Ascend NPU graph support (#9350)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: yezhifeng (D) <y00897525@china.huawei.com>
Co-authored-by: anon189Ty <Stari_Falcon@outlook.com>
Co-authored-by: Maksim <makcum888e@mail.ru>
Co-authored-by: ssshinigami <44640852+ssshinigami@users.noreply.github.com>
2025-08-19 20:32:27 -07:00
Chang Su
46fe8b8cb2 [CI] Fix lint issues (#9361) 2025-08-19 13:05:36 -07:00
mpashkovskiy
a3b810ebdb fix: enable multi-GPU Triton fused MoE tuning (#6295) 2025-08-19 10:16:58 -07:00
Even Zhou
f4fafacc5d Revert "[feature] Ascend NPU graph support (#8027)" (#9348) 2025-08-19 10:11:23 -07:00
Yuan Luo
968e181826 Fix triton_fused_moe unit test and benchmark (#9276)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2025-08-18 00:54:33 -07:00
VDV1985
94371dbbd6 [feature] Ascend NPU graph support (#8027)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: yezhifeng (D) <y00897525@china.huawei.com>
Co-authored-by: anon189Ty <Stari_Falcon@outlook.com>
Co-authored-by: Maksim <makcum888e@mail.ru>
Co-authored-by: ssshinigami <44640852+ssshinigami@users.noreply.github.com>
2025-08-16 17:25:17 -07:00
Cheng Wan
295895120d [6/N] MoE Refactor: Cleanup MoE-related configs (#8849) 2025-08-14 21:14:53 -07:00
Yineng Zhang
1466c1b896 feat: support glm4 tuning (#8473) 2025-07-28 14:32:58 -07:00
Yuxuan Zhang
6d6a8bc278 GLM-4.5 Model Support (#8224)
Co-authored-by: Lifu Huang <lifu.hlf@gmail.com>
Co-authored-by: Binyao Jiang <byjiang1996@gmail.com>
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
2025-07-27 22:54:07 -07:00
Cheng Wan
abda2542d5 Fix tuning_fused_moe_triton.py (#8175) 2025-07-19 17:33:50 -07:00
Yuan Luo
253454de9b Integrate triton moe kernel (#7689)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2025-07-06 20:05:49 -07:00
Xiaoyu Zhang
0ae1e9a755 refine fused_moe benchmark (#7221) 2025-06-15 21:21:32 -07:00
Quanfeng Li
ef32677444 Fix positional argument (#7093) 2025-06-11 18:31:13 -07:00
Xiaoyu Zhang
3712abfaf9 Fuse routed scaling factor in deepseek (#6970) 2025-06-08 15:24:24 -07:00
Xiaoyu Zhang
fa3592cfeb rebase h20 fused_moe config (#6966) 2025-06-08 05:01:34 -07:00
Yineng Zhang
1fb76ebb93 Revert "Fuse routed scaling factor in topk_reduce kernel (#6220)" (#6968) 2025-06-07 21:02:49 -07:00
Xiaoyu Zhang
515ef4facb Fuse routed scaling factor in topk_reduce kernel (#6220) 2025-06-07 11:06:50 -07:00
Cheng Wan
81964328b7 Set num_fused_shared_experts as num_shared_experts when shared_experts fusion is not disabled (#6736) 2025-06-04 15:53:22 -07:00
Cheng Wan
8a5480528d [Refactor] Rename n_share_experts_fusion as num_fused_shared_experts (#6735) 2025-06-03 17:48:24 -07:00
Xiaoyu Zhang
076103535c fix log_info_on_rank0 error when run benchmark (#6260) 2025-05-28 00:20:01 -07:00
Yuan Luo
c087ddd686 Refine pre_reorder_triton_kernel slightly to improve performance (#6627)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2025-05-28 00:15:23 -07:00
fzyzcjy
ef8ec07b2c Support tuning moe for llama 4 model (#6042) 2025-05-12 15:47:01 -07:00
Lianmin Zheng
e8e18dcdcc Revert "fix some typos" (#6244) 2025-05-12 12:53:26 -07:00
applesaucethebun
d738ab52f8 fix some typos (#6209)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-13 01:42:38 +08:00
Lifu Huang
6e2da51561 Replace time.time() to time.perf_counter() for benchmarking. (#6178)
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>
2025-05-11 14:32:49 -07:00
Xiaoyu Zhang
1cc326032d simplify fused_moe config logging (#5801) 2025-04-28 17:04:54 -07:00
Yi Zhang
a0251a3fd6 add fused moe config for qwen3moe fp8/bf16 (#5849) 2025-04-28 11:55:52 -07:00
Xiaoyu Zhang
e132cba2a8 fused moe triton tuning script support qwen3 (#5842) 2025-04-28 09:13:04 -07:00
XinyuanTong
0045f4b2af feat: Add fused moe triton config for qwen3 moe on h100 (#5833) 2025-04-28 08:37:13 -07:00
Zhaoyi Li
c555d794f7 Minor update for ROCm variable style (#5562) 2025-04-19 23:45:27 -07:00
lambert0312
61e7c4dd21 Add A800 shared experts fused MoE kernel tuning configs for DeepSeek V3/R1 (#5368) 2025-04-14 18:39:44 -07:00
Xiaoyu Zhang
3e4794aad8 refine fused_moe tuning docs (#5294) 2025-04-12 10:01:13 -07:00
Xiaoyu Zhang
924ca7c92c Add DeepSeek V3/R1 shared experts fusion (#4918) 2025-04-04 01:59:29 -07:00
penguin_wwy
38f25e87fc Correcting default configuration when benchmarking fused_moe (#4665) 2025-03-22 00:52:34 -07:00
Xiaoyu Zhang
7130a7cea9 refine sgl_moe_align_block_size_benchmark (#4327) 2025-03-11 22:48:38 -07:00
yych0745
6a02b32d07 Add A100 tuning configs for DeepSeek R1/V3 channel-wise INT8 (#4287)
Co-authored-by: HandH1998 <1335248067@qq.com>
2025-03-11 00:49:06 -07:00
Lianmin Zheng
66301e124f Improve code styles (#4021) 2025-03-03 03:20:23 -08:00
Lianmin Zheng
ac2387279e Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
2025-03-03 00:12:04 -08:00
Chayenne
18bb216c28 Revert "[MOE] enable efficient moe_alignment multi-blocks execution (3x~6x)" (#3982) 2025-02-28 23:57:17 -08:00
yiakwy-xpu-ml-framework-team
1c96fa86cf [MOE] enable efficient moe_alignment multi-blocks execution (3x~6x) (#3613) 2025-02-27 19:42:48 -08:00
laixin
b0df5d240b Tuning Script for Feature DeepSeek V3/R1 INT8 Quantization (block-wise) (#3922)
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
2025-02-27 10:59:46 +00:00
yigex
ddf39d3fce [ROCm] Optimal MOE Tuning for AMD Radeon Graphics (#3567) 2025-02-17 17:54:10 -08:00
Xiaoyu Zhang
2f47d710ae refine some typo (#3473) 2025-02-10 23:35:44 +08:00
Yineng Zhang
fad315cb8e fix EAGLE 2 non greedy case (#3407)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2025-02-09 07:28:34 +08:00
GaoYuYang
849f58d617 Update fused_moe's benchmark (#3346) 2025-02-08 21:58:21 +08:00
yiakwy-xpu-ml-framework-team
64480df495 [BUG] fix moe benchmark when bs*seq is small (#3382) 2025-02-08 15:39:44 +08:00
Xiaoyu Zhang
cdae77b03d optimize moe_align_kernel cuda (#3347) 2025-02-07 00:53:46 +08:00