[1/N][Draft][Refactor]torchair pangu_moe modeling refactor (#2437)

### What this PR does / why we need it? 1. Similar to #2384 , this PR add a torchair-specific modeling for pangu. 2. Fixes a bug introduced by routed_scaling_factor in #2675 . 3. remove eager test case for pangu since there has already been a torchair test case. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: 6997a25ac6 --------- Signed-off-by: zengyanjia <z00883269@china.huawei.com> Signed-off-by: Angazenn <supperccell@163.com> Co-authored-by: zengyanjia <z00883269@china.huawei.com>
2025-09-04 10:39:21 +08:00
parent a58013440a
commit e7409e95ee
6 changed files with 1185 additions and 55 deletions
--- a/vllm_ascend/torchair/utils.py
+++ b/vllm_ascend/torchair/utils.py
@@ -173,6 +173,11 @@ def register_torchair_model():
        "Qwen3MoeForCausalLM",
        "vllm_ascend.torchair.models.qwen3_moe:CustomQwen3MoeForCausalLM")

+    ModelRegistry.register_model(
+        "PanguProMoEForCausalLM",
+        "vllm_ascend.torchair.models.torchair_pangu_moe:PanguProMoEForCausalLM"
+    )
+

 def torchair_quant_method_register():
    from vllm_ascend.quantization.quantizer import \