[MoE][Dist] Fix Qwen MoE accuracy bug in DP scenario (#1856)

### What this PR does / why we need it? Fix Qwen MoE accuracy bug in DP scenario. Now the implentment of `FusedMoE` in vLLM use `All2AllManager` to manager different all2all algorithm branch. And the default branch use `Multicast` in `dispatch` phase and `all_reduce` in `combine` phase, which are not implented in vLLM-Ascend. This leading to invoking into a default implentment in `base_communicator`, with empty `dispatch` and `combine` operations, thus causing the accuracy issue on it. This pr is a temporary workaround, refacting all2all in vLLM-Ascend could be a better way. - vLLM version: v0.10.0 - vLLM main: ad57f23f6a --------- Signed-off-by: MengqingCao <cmq0113@163.com>
2025-08-04 10:24:18 +08:00
parent f939381c6f
commit af04ee9e7a
3 changed files with 46 additions and 58 deletions
--- a/tests/e2e/multicard/test_data_parallel.py
+++ b/tests/e2e/multicard/test_data_parallel.py
@@ -27,7 +27,7 @@ from unittest.mock import patch

 import pytest

-MODELS = ["Qwen/Qwen2.5-0.5B-Instruct"]
+MODELS = ["Qwen/Qwen2.5-0.5B-Instruct", "Qwen/Qwen3-30B-A3B"]


@pytest.mark.parametrize("model", MODELS)
@@ -54,6 +54,8 @@ def test_data_parallel_inference(model, max_tokens):
        "--trust-remote-code",
        "--enforce-eager",
    ]
+    if model == "Qwen/Qwen3-30B-A3B":
+        cmd.append("--enable-expert-parallel")

    print(f"Running subprocess: {' '.join(cmd)}")
    proc = subprocess.run(cmd,