Support Pangu Pro MoE model (#1204)

### What this PR does / why we need it?
Support Pangu Pro MoE model (https://arxiv.org/abs/2505.21411)

### Does this PR introduce _any_ user-facing change?
Yes, new model supported

### How was this patch tested?
Test locally

---------

Signed-off-by: angazenn <zengyanjia@huawei.com>
Co-authored-by: angazenn <zengyanjia@huawei.com>
This commit is contained in:
Angazenn
2025-06-20 23:59:59 +08:00
committed by GitHub
parent 00ae250f3c
commit 2f1266d451
4 changed files with 647 additions and 1 deletions

View File

@@ -43,6 +43,7 @@ def forward_oot(
activation: str = "silu",
) -> torch.Tensor:
topk_weights, topk_ids = select_experts(
global_num_experts=global_num_experts,
hidden_states=x,
router_logits=router_logits,
top_k=top_k,