[BugFix]Support redundant experts in EPLB (#3473)

This PR adds support for redundant experts in the EPLB. 

Key points: 
- Use global_num_experts = num_experts + num_redundant_experts
consistently.
- Backward compatible when num_redundant_experts=0. 

Tested 
On a 16-rank setup (W8A8) with static EPLB and expert_map_path,
verifying router logits shape and successful requests.

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: yechao237 <yechao20180411@gmail.com>
This commit is contained in:
yechao237
2025-10-18 00:09:16 +08:00
committed by GitHub
parent 07ca1b9b78
commit 4750d45d86
12 changed files with 23 additions and 35 deletions

View File

@@ -856,8 +856,9 @@ class TorchairAscendUnquantizedFusedMoEMethod(UnquantizedFusedMoEMethod):
shared_experts: Optional[Any] = None,
**kwargs,
) -> torch.Tensor:
is_deepseek_v3_r1 = global_num_experts == 256
global_redundant_expert_num = get_ascend_config(
).init_redundancy_expert
is_deepseek_v3_r1 = global_num_experts - global_redundant_expert_num == 256
# NOTE: now npu_moe_gating_top_k can only support `group_count=256` pattern
if is_deepseek_v3_r1:
topk_weights, topk_ids, _ = torch_npu.npu_moe_gating_top_k(