[NVIDIA] [2/N] Optimize silu_and_mul_scaled_fp4_grouped_quant perf (#9556)

This commit is contained in:
Kaixi Hou
2025-08-29 17:17:03 -07:00
committed by GitHub
parent ff9b561817
commit 5c34b4f1c7
7 changed files with 297 additions and 61 deletions

View File

@@ -394,9 +394,8 @@ void silu_and_mul_scaled_fp4_experts_quant(
torch::Tensor& output_scale,
torch::Tensor const& input,
torch::Tensor const& input_global_scale,
torch::Tensor const& input_offset_by_experts,
torch::Tensor const& output_scale_offset_by_experts,
torch::Tensor const& mask);
torch::Tensor const& mask,
bool use_silu_and_mul);
/*
* From csrc/moe/cutlass_moe/w4a8
*/