[Bugfix] Fix acc bug when enbale dispatch_gmm_combine_decode and eplb (#5806)

### What this PR does / why we need it?

Fix acc bug when enbale dispatch_gmm_combine_decode and eplb.

After eplb, expert table may change, so mapping is needed, while
fused_mc2 miss the mapping.

More info about this operator, please refer to RFC: issue
https://github.com/vllm-project/vllm-ascend/issues/5476

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

without this pr, qwen3-235b eplb with dispatch_gmm_combine_decode get
acc 3.33% on aime2024.

with this pr,

test qwen3-235b eplb on a single A3 node(ep16)
without dispatch_gmm_combine_decode
| dataset | version | metric | mode | vllm-api-stream-chat |
|----- | ----- | ----- | ----- | -----|
| aime2024 | 604a78 | accuracy | gen | 86.67 |

with dispatch_gmm_combine_decode
| dataset | version | metric | mode | vllm-api-stream-chat |
|----- | ----- | ----- | ----- | -----|
| aime2024 | 604a78 | accuracy | gen | 86.67 |


- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef

Signed-off-by: wangqiankun <wangqiankun13@huawei.com>
This commit is contained in:
wangqiankun13
2026-01-15 09:21:18 +08:00
committed by GitHub
parent 7078dff691
commit d840f153f4

View File

@@ -300,6 +300,11 @@ class FusedMC2CommImpl(MoECommMethod):
assert isinstance(self.token_dispatcher, TokenDispatcherWithMC2), \
"token_dispatcher must be an instance of TokenDispatcherWithMC2."
# Apply log2phy if needed
if log2phy is not None:
topk_ids = log2phy[topk_ids]
group_list_type = None
expert_tokens = None
if envs_ascend.VLLM_ASCEND_ENABLE_FUSED_MC2 == 1:
@@ -331,7 +336,7 @@ class FusedMC2CommImpl(MoECommMethod):
group_ep=self.token_dispatcher.moe_all_to_all_group_name,
ep_rank_size=self.token_dispatcher.ep_world_size,
ep_rank_id=self.token_dispatcher.ep_rank_id,
moe_expert_num=len(expert_map),
moe_expert_num=self.moe_config.num_experts,
global_bs=self.token_dispatcher.fused_global_bs)
else:
raise ValueError(