[CI] Enable linux-aarch64-a2 (64GB) and tp2 * 2 max-parallel to speed up CI (#2065)
### What this PR does / why we need it?
Currently our workflow run time takes about 3 hours in total, which
seriously affects the developer experience, so it is urgent to have a
optimization, after this pr, It is expected that the running time of the
full CI can be shortened to 1h40min.
- Enable linux-aarch64-a2 (64GB) to replace linux-arm64-npu (32GB)
- Change TP4 ---> TP2 * 2 max-parallel
- Move DeepSeek-V2-Lite-W8A8 to single card test
### Does this PR introduce _any_ user-facing change?
No
- vLLM version: v0.10.0
- vLLM main:
a2480251ec
---------
Signed-off-by: wangli <wangli858794774@gmail.com>
This commit is contained in:
@@ -46,7 +46,7 @@ def test_generate_with_allgather():
|
||||
sampling_params = SamplingParams(max_tokens=100, temperature=0.0)
|
||||
|
||||
with VllmRunner(snapshot_download("vllm-ascend/DeepSeek-V3-Pruning"),
|
||||
tensor_parallel_size=4,
|
||||
tensor_parallel_size=2,
|
||||
enforce_eager=True,
|
||||
max_model_len=1024,
|
||||
dtype="auto",
|
||||
@@ -74,7 +74,7 @@ def test_generate_with_alltoall():
|
||||
sampling_params = SamplingParams(max_tokens=100, temperature=0.0)
|
||||
|
||||
with VllmRunner(snapshot_download("vllm-ascend/DeepSeek-V3-Pruning"),
|
||||
tensor_parallel_size=4,
|
||||
tensor_parallel_size=2,
|
||||
enforce_eager=True,
|
||||
max_model_len=1024,
|
||||
dtype="auto",
|
||||
|
||||
Reference in New Issue
Block a user