[CI] Enable linux-aarch64-a2 (64GB) and tp2 * 2 max-parallel to speed up CI (#2065)

### What this PR does / why we need it? Currently our workflow run time takes about 3 hours in total, which seriously affects the developer experience, so it is urgent to have a optimization, after this pr, It is expected that the running time of the full CI can be shortened to 1h40min. - Enable linux-aarch64-a2 (64GB) to replace linux-arm64-npu (32GB) - Change TP4 ---> TP2 * 2 max-parallel - Move DeepSeek-V2-Lite-W8A8 to single card test ### Does this PR introduce _any_ user-facing change? No - vLLM version: v0.10.0 - vLLM main: a2480251ec --------- Signed-off-by: wangli <wangli858794774@gmail.com>
2025-07-29 18:59:05 +08:00
parent ca8007f584
commit f60bb474f9
14 changed files with 75 additions and 75 deletions
--- a/tests/e2e/multicard/test_fused_moe_allgather_ep.py
+++ b/tests/e2e/multicard/test_fused_moe_allgather_ep.py
@@ -46,7 +46,7 @@ def test_generate_with_allgather():
    sampling_params = SamplingParams(max_tokens=100, temperature=0.0)

    with VllmRunner(snapshot_download("vllm-ascend/DeepSeek-V3-Pruning"),
-                    tensor_parallel_size=4,
+                    tensor_parallel_size=2,
                    enforce_eager=True,
                    max_model_len=1024,
                    dtype="auto",
@@ -74,7 +74,7 @@ def test_generate_with_alltoall():
    sampling_params = SamplingParams(max_tokens=100, temperature=0.0)

    with VllmRunner(snapshot_download("vllm-ascend/DeepSeek-V3-Pruning"),
-                    tensor_parallel_size=4,
+                    tensor_parallel_size=2,
                    enforce_eager=True,
                    max_model_len=1024,
                    dtype="auto",