Support Pangu Pro MoE model (#1204)

### What this PR does / why we need it?
Support Pangu Pro MoE model (https://arxiv.org/abs/2505.21411)

### Does this PR introduce _any_ user-facing change?
Yes, new model supported

### How was this patch tested?
Test locally

---------

Signed-off-by: angazenn <zengyanjia@huawei.com>
Co-authored-by: angazenn <zengyanjia@huawei.com>
This commit is contained in:
Angazenn
2025-06-20 23:59:59 +08:00
committed by GitHub
parent 00ae250f3c
commit 2f1266d451
4 changed files with 647 additions and 1 deletions

View File

@@ -759,6 +759,7 @@ def select_experts(
custom_routing_function: Optional[Callable] = None,
scoring_func: str = "softmax",
e_score_correction_bias: Optional[torch.Tensor] = None,
global_num_experts: Optional[torch.Tensor] = None
) -> tuple[torch.Tensor, torch.Tensor]:
"""
Select top-k experts based on router logits.
@@ -826,7 +827,8 @@ def select_experts(
hidden_states=hidden_states,
gating_output=router_logits,
topk=top_k,
renormalize=renormalize)
renormalize=renormalize,
global_num_experts=global_num_experts)
# Required by npu_moe_init_routing
topk_ids = topk_ids.to(torch.int32)
return topk_weights, topk_ids