Add A800 shared experts fused MoE kernel tuning configs for DeepSeek V3/R1 (#5368)

2025-04-15 09:39:44 +08:00
parent dae7944440
commit 61e7c4dd21
5 changed files with 592 additions and 0 deletions
--- a/benchmark/kernels/fused_moe_triton/README.md
+++ b/benchmark/kernels/fused_moe_triton/README.md
@@ -27,6 +27,14 @@ python benchmark/kernels/fused_moe_triton/tuning_fused_moe_triton.py \
    --n-share-experts-fusion 8 \
    --dtype fp8_w8a8 \
    --tune
+
+# Tune DeepSeek-R1 with channel-wise INT8, TP=16 and n_share_experts_fusion=16
+python benchmark/kernels/fused_moe_triton/tuning_fused_moe_triton.py \
+    --model meituan/DeepSeek-R1-Channel-INT8 \
+    --tp-size 16 \
+    --n-share-experts-fusion 16 \
+    --dtype int8_w8a8 \
+    --tune
 ```

 After tuning, a configuration file (e.g., `E=64,N=640,device_name=NVIDIA_GeForce_RTX_4090,dtype=fp8_w8a8.json`) will be generated in the current directory. You can move this file to `sglang/srt/layers/fused_moe_triton/configs/` to use it in `sglang`.