[smoke][bugfix] moe_init_routing_v2 active_expert_range use int type (#5521)

### What this PR does / why we need it? The float kernel of MOE_init_routing_v2 in the dispatch allgather operation does not support tensor format for active_expert_range; it only supports int. PR5311 To unify the variables `local_num_experts` and `self.local_num_experts`, `self.local_num_experts` was used consistently, which led to the subsequent integer type parameter being converted to a tensor type. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? gsm8k | exact_match,strict-match: ground_truth=0.89 | measured=0.8939 | success=✅ gsm8k | exact_match,flexible-extract: ground_truth=0.85 | measured=0.856 | success=✅ ceval-valid | acc,none: ground_truth=0.84 | measured=0.8373 | success=✅ Model Parameters: {'pretrained': 'Qwen/Qwen3-30B-A3B', 'tensor_parallel_size': 2, 'dtype': 'auto', 'trust_remote_code': False, 'max_model_len': 4096, 'gpu_memory_utilization': 0.6, 'enable_expert_parallel': True} - vLLM version: v0.13.0 - vLLM main: 45c1ca1ca1 Signed-off-by: shenchuxiaofugui <1311027364@qq.com>
2025-12-31 09:19:04 +08:00
parent 2ee17e50a1
commit bdc721d35a
3 changed files with 4 additions and 3 deletions
--- a/vllm_ascend/ops/fused_moe/fused_moe.py
+++ b/vllm_ascend/ops/fused_moe/fused_moe.py
@@ -180,7 +180,7 @@ class AscendFusedMoE(FusedMoE):
                             or ascend_config.expert_map_record_path) and (
                                 self.log2phy is not None)
        self.local_num_experts = (torch.sum(
-            self._expert_map != -1) if self._expert_map is not None else
+            self._expert_map != -1).item() if self._expert_map is not None else
                                  self.global_num_experts)
        if self._expert_map is not None:
            logger.info_once(
--- a/vllm_ascend/ops/fused_moe/token_dispatcher.py
+++ b/vllm_ascend/ops/fused_moe/token_dispatcher.py
@@ -335,7 +335,9 @@ class TokenDispatcherWithAllGather(MoETokenDispatcher):
        super().__init__(**kwargs)
        self.apply_router_weight_on_input = False
        self.max_num_tokens = kwargs.get("max_num_tokens")
-        self.num_experts_local = kwargs.get("num_local_experts", 0)
+        num_experts_local = kwargs.get("num_local_experts", 0)
+        self.num_experts_local = num_experts_local.item() if torch.is_tensor(
+            num_experts_local) else int(num_experts_local)
        self.original_shape = None
        self.with_quant = False