[MOE][Bugfix] Cancel H2D for expert_map (#7000)
### What this PR does / why we need it?
If expert_map is on the device, there may be occasional repeated answers
in long output scenarios.
dsv3.2-exp-w8a8
No garbled characters are displayed in the output.
| dataset | version | metric | mode | vllm-api-stream-chat |
|----- | ----- | ----- | ----- | -----|
| aime2025 | ef2f4f | accuracy | gen | 60.00 |
- vLLM version: v0.16.0
- vLLM main:
15d76f74e2
Signed-off-by: shenchuxiaofugui <1311027364@qq.com>
This commit is contained in:
@@ -286,9 +286,7 @@ class AscendFusedMoE(FusedMoE):
|
||||
)
|
||||
self.global_num_experts = num_experts + self.global_redundant_expert_num
|
||||
self.dynamic_eplb = eplb_config.dynamic_eplb and (self.log2phy is not None)
|
||||
self.local_num_experts = (
|
||||
torch.sum(self._expert_map != -1).item() if self._expert_map is not None else self.global_num_experts
|
||||
)
|
||||
self.local_num_experts = self.global_num_experts // self.ep_size
|
||||
if self._expert_map is not None:
|
||||
logger.info_once(
|
||||
"[EP Rank %s/%s] Expert parallelism is enabled. Local/global"
|
||||
|
||||
Reference in New Issue
Block a user