[MOE][Bugfix] Cancel H2D for expert_map (#7000)

### What this PR does / why we need it?
If expert_map is on the device, there may be occasional repeated answers
in long output scenarios.

dsv3.2-exp-w8a8
No garbled characters are displayed in the output.
| dataset | version | metric | mode | vllm-api-stream-chat |
|----- | ----- | ----- | ----- | -----|
| aime2025 | ef2f4f | accuracy | gen | 60.00 |

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

Signed-off-by: shenchuxiaofugui <1311027364@qq.com>
This commit is contained in:
LI SHENGYONG
2026-03-09 17:53:54 +08:00
committed by GitHub
parent 82fdd40d49
commit a76a509fae
2 changed files with 11 additions and 7 deletions

View File

@@ -286,9 +286,7 @@ class AscendFusedMoE(FusedMoE):
)
self.global_num_experts = num_experts + self.global_redundant_expert_num
self.dynamic_eplb = eplb_config.dynamic_eplb and (self.log2phy is not None)
self.local_num_experts = (
torch.sum(self._expert_map != -1).item() if self._expert_map is not None else self.global_num_experts
)
self.local_num_experts = self.global_num_experts // self.ep_size
if self._expert_map is not None:
logger.info_once(
"[EP Rank %s/%s] Expert parallelism is enabled. Local/global"