[MOE][Bugfix] Cancel H2D for expert_map (#7000)

### What this PR does / why we need it? If expert_map is on the device, there may be occasional repeated answers in long output scenarios. dsv3.2-exp-w8a8 No garbled characters are displayed in the output. | dataset | version | metric | mode | vllm-api-stream-chat | |----- | ----- | ----- | ----- | -----| | aime2025 | ef2f4f | accuracy | gen | 60.00 | - vLLM version: v0.16.0 - vLLM main: 15d76f74e2 Signed-off-by: shenchuxiaofugui <1311027364@qq.com>
2026-03-09 17:53:54 +08:00
parent 82fdd40d49
commit a76a509fae
2 changed files with 11 additions and 7 deletions
--- a/vllm_ascend/ops/fused_moe/fused_moe.py
+++ b/vllm_ascend/ops/fused_moe/fused_moe.py
@@ -286,9 +286,7 @@ class AscendFusedMoE(FusedMoE):
        )
        self.global_num_experts = num_experts + self.global_redundant_expert_num
        self.dynamic_eplb = eplb_config.dynamic_eplb and (self.log2phy is not None)
-        self.local_num_experts = (
-            torch.sum(self._expert_map != -1).item() if self._expert_map is not None else self.global_num_experts
-        )
+        self.local_num_experts = self.global_num_experts // self.ep_size
        if self._expert_map is not None:
            logger.info_once(
                "[EP Rank %s/%s] Expert parallelism is enabled. Local/global"