Add A800 shared experts fused MoE kernel tuning configs for DeepSeek V3/R1 (#5368)

This commit is contained in:
lambert0312
2025-04-15 09:39:44 +08:00
committed by GitHub
parent dae7944440
commit 61e7c4dd21
5 changed files with 592 additions and 0 deletions

View File

@@ -27,6 +27,14 @@ python benchmark/kernels/fused_moe_triton/tuning_fused_moe_triton.py \
--n-share-experts-fusion 8 \
--dtype fp8_w8a8 \
--tune
# Tune DeepSeek-R1 with channel-wise INT8, TP=16 and n_share_experts_fusion=16
python benchmark/kernels/fused_moe_triton/tuning_fused_moe_triton.py \
--model meituan/DeepSeek-R1-Channel-INT8 \
--tp-size 16 \
--n-share-experts-fusion 16 \
--dtype int8_w8a8 \
--tune
```
After tuning, a configuration file (e.g., `E=64,N=640,device_name=NVIDIA_GeForce_RTX_4090,dtype=fp8_w8a8.json`) will be generated in the current directory. You can move this file to `sglang/srt/layers/fused_moe_triton/configs/` to use it in `sglang`.