[BugFix]Support redundant experts in EPLB (#3473)

This PR adds support for redundant experts in the EPLB. Key points: - Use global_num_experts = num_experts + num_redundant_experts consistently. - Backward compatible when num_redundant_experts=0. Tested On a 16-rank setup (W8A8) with static EPLB and expert_map_path, verifying router logits shape and successful requests. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: yechao237 <yechao20180411@gmail.com>
2025-10-18 00:09:16 +08:00
parent 07ca1b9b78
commit 4750d45d86
12 changed files with 23 additions and 35 deletions
--- a/vllm_ascend/ops/moe/token_dispatcher.py
+++ b/vllm_ascend/ops/moe/token_dispatcher.py
@@ -123,10 +123,7 @@ class TokenDispatcherWithMC2(MoETokenDispatcher):
    ):
        if self.with_quant:
            quant_mode = 2
-            if (expert_map is not None):
-                moe_expert_num = len(expert_map) + global_redundant_expert_num
-            else:
-                moe_expert_num = global_redundant_expert_num
+            moe_expert_num = len(expert_map)
        else:
            quant_mode = 0
            moe_expert_num = len(expert_map)