xc-llm-ascend/tests/e2e/multicard/test_data_parallel_tp2.py

"""
Run `pytest tests/e2e/multicard/test_data_parallel_tp2.py`.
"""

import os
import subprocess
import sys
from unittest.mock import patch

import pytest

MODELS = ["Qwen/Qwen3-0.6B"]


@pytest.mark.parametrize("model", MODELS)
@pytest.mark.parametrize("max_tokens", [32])
@patch.dict(os.environ, {"ASCEND_RT_VISIBLE_DEVICES": "0,1,2,3"})
def test_qwen_inference_dp2_tp2(model, max_tokens):
    script = "examples/offline_data_parallel.py"

    env = os.environ.copy()

    cmd = [
        sys.executable,
        script,
        "--model",
        model,
        "--dp-size",
        "2",
        "--tp-size",
        "2",
        "--node-size",
        "1",
        "--node-rank",
        "0",
        "--trust-remote-code",
    ]

    print(f"Running subprocess: {' '.join(cmd)}")
    proc = subprocess.run(cmd,
                          env=env,
                          stdout=subprocess.PIPE,
                          stderr=subprocess.STDOUT,
                          timeout=600)
    output = proc.stdout.decode(errors='ignore')

    print(output)

    assert "DP rank 0 needs to process" in output
    assert "DP rank 1 needs to process" in output
    assert "Generated text:" in output
    assert proc.returncode == 0
[Bugfix] fix dp parallel + tp > 1 offline inference port conflict (#4539) ### What this PR does / why we need it? fix dp parallel + tp > 1 offline inference port conflict issue import PR:https://github.com/vllm-project/vllm-ascend/pull/429 - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: leo-pony <nengjunma@outlook.com> 2025-11-29 18:37:11 +08:00			`"""`
			Run `pytest tests/e2e/multicard/test_data_parallel_tp2.py`.
			`"""`

			`import os`
			`import subprocess`
			`import sys`
			`from unittest.mock import patch`

			`import pytest`

			`MODELS = ["Qwen/Qwen3-0.6B"]`


			`@pytest.mark.parametrize("model", MODELS)`
			`@pytest.mark.parametrize("max_tokens", [32])`
			`@patch.dict(os.environ, {"ASCEND_RT_VISIBLE_DEVICES": "0,1,2,3"})`
[CI]cleanup e2e test (#4800) ### What this PR does / why we need it? This PR refactors the E2E multicard test suite to improve test case identification and maintainability. Specifically, it renames various test functions to be more descriptive (explicitly indicating model families like Qwen/DeepSeek and parallelism strategies like DP/TP/PP/EP) and cleans up outdated or redundant test configurations in the offline distributed inference tests. Key Changes: 1. Test Function Renaming (Standardization): Renamed multiple test functions across `tests/e2e/multicard/` to include clear suffixes/prefixes regarding the model and parallel strategy. This helps differentiate test cases in CI logs and prevents naming collisions. `test_aclgraph_capture_replay.py`: - `test_aclgraph_capture_replay_dp2` -> `test_aclgraph_capture_replay_metrics_dp2` `test_data_parallel.py`: - `test_data_parallel_inference` -> `test_qwen_inference_dp2` `test_data_parallel_tp2.py`: - `test_data_parallel_inference` -> `test_qwen_inference_dp2_tp2` `test_expert_parallel.py`: - `test_e2e_ep_correctness` -> `test_deepseek_correctness_ep` `test_external_launcher.py`: - `test_external_launcher` -> `test_qwen_external_launcher` - `test_moe_external_launcher` -> `test_qwen_moe_external_launcher_ep` - `test_external_launcher_and_sleepmode` -> `test_qwen_external_launcher_with_sleepmode` - `test_external_launcher_and_sleepmode_level2` -> `test_qwen_external_launcher_with_sleepmode_level2` - `test_mm_allreduce` -> `test_qwen_external_launcher_with_matmul_allreduce` `test_full_graph_mode.py`: - `test_models_distributed_Qwen3_MOE_TP2_WITH_FULL_DECODE_ONLY` -> `test_qwen_moe_with_full_decode_only` - `test_models_distributed_Qwen3_MOE_TP2_WITH_FULL` -> `test_qwen_moe_with_full` `test_fused_moe_allgather_ep.py`: - `test_generate_with_allgather `-> `test_deepseek_moe_fused_allgather_ep` - `test_generate_with_alltoall` -> `test_deepseek_moe_fused_alltoall_ep` `test_offline_weight_load.py`: - `test_offline_weight_load_and_sleepmode` -> `test_qwen_offline_weight_load_and_sleepmode` `test_pipeline_parallel.py`: - `test_models` -> `test_models_pp2` 2. Distributed Inference Cleanup (`test_offline_inference_distributed.py`): model list changes: ``` QWEN_DENSE_MODELS = [ - "vllm-ascend/Qwen3-8B-W8A8", "vllm-ascend/Qwen2.5-0.5B-Instruct-W8A8" + "vllm-ascend/Qwen3-8B-W8A8", ] ``` ``` - QWEN_W4A8_OLD_VERSION_MODELS = [ - "vllm-ascend/Qwen3-8B-W4A8", - ] - QWEN_W4A8_NEW_VERSION_MODELS = [ - "vllm-ascend/DeepSeek-V3-W4A8-Pruing", - "vllm-ascend/DeepSeek-V3.1-W4A8-puring", - ] + DEEPSEEK_W4A8_MODELS = [ + "vllm-ascend/DeepSeek-V3.1-W4A8-puring", + ] ``` Test Function Changes: - removed `test_models_distributed_QwQ` - removed `test_models_distributed_Qwen3_W8A8` - removed `test_models_distributed_Qwen3_W4A8DYNAMIC_old_version` - `test_models_distributed_Qwen3_W4A8DYNAMIC_new_version` -> `test_models_distributed_Qwen3_W4A8DYNAMIC` - vLLM version: v0.12.0 - vLLM main: https://github.com/vllm-project/vllm/commit/ad32e3e19ccf0526cb6744a5fed09a138a5fb2f9 --------- Signed-off-by: MrZ20 <2609716663@qq.com> 2025-12-11 20:35:32 +08:00			`def test_qwen_inference_dp2_tp2(model, max_tokens):`
[Bugfix] fix dp parallel + tp > 1 offline inference port conflict (#4539) ### What this PR does / why we need it? fix dp parallel + tp > 1 offline inference port conflict issue import PR:https://github.com/vllm-project/vllm-ascend/pull/429 - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: leo-pony <nengjunma@outlook.com> 2025-11-29 18:37:11 +08:00			`script = "examples/offline_data_parallel.py"`

			`env = os.environ.copy()`

			`cmd = [`
			`sys.executable,`
			`script,`
			`"--model",`
			`model,`
			`"--dp-size",`
			`"2",`
			`"--tp-size",`
			`"2",`
			`"--node-size",`
			`"1",`
			`"--node-rank",`
			`"0",`
			`"--trust-remote-code",`
			`]`

			`print(f"Running subprocess: {' '.join(cmd)}")`
			`proc = subprocess.run(cmd,`
			`env=env,`
			`stdout=subprocess.PIPE,`
			`stderr=subprocess.STDOUT,`
			`timeout=600)`
upgrade torch npu version (#4433) vLLM graph feature now rely on torch >=2.8. To make graph mode work, we need upgrade torch version as well. For long term support, upgrade torch to a newer one is good to go as well. Related vLLM change: https://github.com/vllm-project/vllm/pull/25110 - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 2025-12-01 19:01:55 +08:00			`output = proc.stdout.decode(errors='ignore')`
[Bugfix] fix dp parallel + tp > 1 offline inference port conflict (#4539) ### What this PR does / why we need it? fix dp parallel + tp > 1 offline inference port conflict issue import PR:https://github.com/vllm-project/vllm-ascend/pull/429 - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: leo-pony <nengjunma@outlook.com> 2025-11-29 18:37:11 +08:00
			`print(output)`

			`assert "DP rank 0 needs to process" in output`
			`assert "DP rank 1 needs to process" in output`
			`assert "Generated text:" in output`
			`assert proc.returncode == 0`