rebase h20 fused_moe config (#6966)
This commit is contained in:
@@ -42,7 +42,7 @@ python benchmark/kernels/fused_moe_triton/tuning_fused_moe_triton.py \
|
|||||||
--tune
|
--tune
|
||||||
```
|
```
|
||||||
|
|
||||||
After tuning, a configuration file (e.g., `E=64,N=640,device_name=NVIDIA_GeForce_RTX_4090,dtype=fp8_w8a8.json`) will be generated in the current directory. You can move this file to `sglang/srt/layers/fused_moe_triton/configs/` to use it in `sglang`.
|
After tuning, a configuration file (e.g., `E=64,N=640,device_name=NVIDIA_GeForce_RTX_4090,dtype=fp8_w8a8.json`) will be generated in the current directory. You can move this file to `sglang/srt/layers/fused_moe_triton/configs/triton_version` dir to use it in `sglang`.
|
||||||
|
|
||||||
### Performance Comparison Tool
|
### Performance Comparison Tool
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user