[1/2][resubmit] sgl-kernel: Fuse routed scaling factor into moe_fused_gate (select_experts) (#8770)

This commit is contained in:
Trevor Morris
2025-08-08 17:55:06 -07:00
committed by GitHub
parent f352b793be
commit 591c232f7c
6 changed files with 62 additions and 12 deletions

View File

@@ -243,7 +243,8 @@ std::vector<at::Tensor> moe_fused_gate(
int64_t topk_group,
int64_t topk,
int64_t num_fused_shared_experts,
double routed_scaling_factor);
double routed_scaling_factor,
bool apply_routed_scaling_factor_on_output);
void fp8_blockwise_scaled_grouped_mm(
torch::Tensor& output,