[MoE][Dist] Fix Qwen MoE accuracy bug in DP scenario (#1856)
### What this PR does / why we need it?
Fix Qwen MoE accuracy bug in DP scenario.
Now the implentment of `FusedMoE` in vLLM use `All2AllManager` to
manager different all2all algorithm branch. And the default branch use
`Multicast` in `dispatch` phase and `all_reduce` in `combine` phase,
which are not implented in vLLM-Ascend. This leading to invoking into a
default implentment in `base_communicator`, with empty `dispatch` and
`combine` operations, thus causing the accuracy issue on it.
This pr is a temporary workaround, refacting all2all in vLLM-Ascend
could be a better way.
- vLLM version: v0.10.0
- vLLM main:
ad57f23f6a
---------
Signed-off-by: MengqingCao <cmq0113@163.com>
This commit is contained in:
@@ -27,7 +27,7 @@ from unittest.mock import patch
|
||||
|
||||
import pytest
|
||||
|
||||
MODELS = ["Qwen/Qwen2.5-0.5B-Instruct"]
|
||||
MODELS = ["Qwen/Qwen2.5-0.5B-Instruct", "Qwen/Qwen3-30B-A3B"]
|
||||
|
||||
|
||||
@pytest.mark.parametrize("model", MODELS)
|
||||
@@ -54,6 +54,8 @@ def test_data_parallel_inference(model, max_tokens):
|
||||
"--trust-remote-code",
|
||||
"--enforce-eager",
|
||||
]
|
||||
if model == "Qwen/Qwen3-30B-A3B":
|
||||
cmd.append("--enable-expert-parallel")
|
||||
|
||||
print(f"Running subprocess: {' '.join(cmd)}")
|
||||
proc = subprocess.run(cmd,
|
||||
|
||||
Reference in New Issue
Block a user