[Cherry-pick] Port MoE multi-stream fix to v0.11.0-dev (#3753)

This PR moves the communication operation of shared experts out of extra
stream because I found that this might cause rtMemcpy related errors
when running shared experts multistream with aclgraph.

Furthermore, I utilize a global variable as extra stream object to avoid
allocating streams for each layer in full-graph mode.

Signed-off-by: whx-sjtu <2952154980@qq.com>
This commit is contained in:
whx
2025-10-25 15:51:43 +08:00
committed by GitHub
parent 1bc61031e5
commit a58ff9e92f
3 changed files with 25 additions and 13 deletions

View File

@@ -28,7 +28,7 @@ from tests.e2e.conftest import VllmRunner
from tests.e2e.model_utils import check_outputs_equal
MODELS = [
"Qwen/Qwen3-0.6B",
"vllm-ascend/DeepSeek-V2-Lite-W8A8",
]