[v0.11.0] [Bugfix] [MoE]fix error in deepseek when using allgather (#3827)

### What this PR does / why we need it? After refactoring vllm_ascend/models and FusedMoE, we are unable to pass `gate` from deepseekv2.py to `AscendFusedMoE.forward`, which will result in error when running deepseek v3/r1 with allgather. Hence, this pr removes `gate` related computations from FusedMoE module in eager/aclgraph mode. ### Does this PR introduce _any_ user-facing change? `rm_router_logits` is deprecated in eager/aclgraph. ### How was this patch tested? e2e & ut Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
2025-10-30 14:59:46 +08:00
parent 211d4b9da4
commit c506ba60fb
7 changed files with 98 additions and 115 deletions
--- a/vllm_ascend/torchair/ops/torchair_fused_moe.py
+++ b/vllm_ascend/torchair/ops/torchair_fused_moe.py
@@ -48,12 +48,12 @@ from vllm_ascend.eplb.core.eplb_utils import (determine_default_expert_map,
 from vllm_ascend.ops.expert_load_balancer import ExpertLoadBalancer
 from vllm_ascend.quantization.quant_config import AscendFusedMoEMethod
 from vllm_ascend.torchair.ops.sequence_parallel import MetadataForPadding
-from vllm_ascend.torchair.utils import (npu_stream_switch, npu_wait_tensor,
+from vllm_ascend.torchair.utils import (get_all_reduce_merge_state,
+                                        get_rm_router_logits_state,
+                                        npu_stream_switch, npu_wait_tensor,
                                        super_kernel)
 from vllm_ascend.utils import (AscendSocVersion, dispose_tensor,
-                               get_all_reduce_merge_state,
-                               get_ascend_soc_version,
-                               get_rm_router_logits_state, is_310p,
+                               get_ascend_soc_version, is_310p,
                               is_hierarchical_communication_enabled)