[sgl-kernel] Support moe_sum_reduce cuda kernel (#10321)

Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
This commit is contained in:
Yuan Luo
2025-09-19 14:12:09 +08:00
committed by GitHub
parent ac2a723bb3
commit 616a3e20df
7 changed files with 346 additions and 10 deletions

View File

@@ -36,6 +36,18 @@ def topk_softmax(
)
def moe_sum_reduce(
input_tensor,
output_tensor,
routed_scaling_factor=0,
):
torch.ops.sgl_kernel.moe_sum_reduce.default(
input_tensor,
output_tensor,
routed_scaling_factor,
)
def moe_fused_gate(
input_tensor,
bias,