[CI] Add qwen-235b-a22b a2 multi-node test (#5393)

### What this PR does / why we need it?
Qwen3-235B-A22B belongs to the TopN model, but there is currently a lack
of care for the test cases of the wen3-235B-A22B model on Atlas A2, and
most of the machines currently owned by users in the community are A2.
When users encounter problems, we currently have no way of knowing
whether the model runs normally on the corresponding version of the
code, so we added it. In addition, we currently see TopN models such as:
qwen-dense, qwen3-30b-a3b, Qwen3-Next, Qwen2.5-Omni, but Qwen3-235B-A22B
is missing.

### Does this PR introduce _any_ user-facing change?
NA

### How was this patch tested?
Test with multi-node, result as following:
1. Accuracy test (Time for executing this test case: 25 minutes)
test running successfully, accuracy as following:
```
dataset    version    metric    mode      vllm-api-general-chat
---------  ---------  --------  ------  -----------------------
gsm8k      7cd45e     accuracy  gen                       95.68
```
2. Perf test  (Time for executing this test case: 1h15 minutes)
test running successfully, throughput as following(This is the atlas A3,
for A2 the result about A3/1.3):
```
╒══════════════════════════╤═════════╤════════════════╤════════════════╤════════════════╤════════════════╤════════════════╤════════════════╤════════════════╤══════╕
│ Performance Parameters   │ Stage   │ Average        │ Min            │ Max            │ Median         │ P75            │ P90            │ P99            │  N   │
╞══════════════════════════╪═════════╪════════════════╪════════════════╪════════════════╪════════════════╪════════════════╪════════════════╪════════════════╪══════╡
│ E2EL                     │ total   │ 384086.3958 ms │ 214767.0486 ms │ 528014.771 ms  │ 387621.5746 ms │ 388776.7492 ms │ 390164.3559 ms │ 488105.8512 ms │ 2800 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼──────┤
│ TTFT                     │ total   │ 159409.9868 ms │ 1849.4588 ms   │ 302439.6965 ms │ 162183.7007 ms │ 162965.477 ms  │ 164274.1936 ms │ 262578.6041 ms │ 2800 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼──────┤
│ TPOT                     │ total   │ 149.8842 ms    │ 130.2175 ms    │ 151.2625 ms    │ 150.473 ms     │ 150.6978 ms    │ 150.9102 ms    │ 151.2131 ms    │ 2800 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼──────┤
│ ITL                      │ total   │ 149.6789 ms    │ 0.0099 ms      │ 283.0242 ms    │ 150.3276 ms    │ 156.8649 ms    │ 168.1372 ms    │ 199.378 ms     │ 2800 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼──────┤
│ InputTokens              │ total   │ 3654.3079      │ 3108.0         │ 4280.0         │ 3629.0         │ 3728.0         │ 3842.1         │ 4079.0         │ 2800 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼──────┤
│ OutputTokens             │ total   │ 1500.0         │ 1500.0         │ 1500.0         │ 1500.0         │ 1500.0         │ 1500.0         │ 1500.0         │ 2800 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼──────┤
│ OutputTokenThroughput    │ total   │ 3.935 token/s  │ 2.8408 token/s │ 6.9843 token/s │ 3.8698 token/s │ 3.8799 token/s │ 3.9916 token/s │ 6.2137 token/s │ 2800 │
╘══════════════════════════╧═════════╧════════════════╧════════════════╧════════════════╧════════════════╧════════════════╧════════════════╧════════════════╧══════╛
╒══════════════════════════╤═════════╤═══════════════════╕
│ Common Metric            │ Stage   │ Value             │
╞══════════════════════════╪═════════╪═══════════════════╡
│ Benchmark Duration       │ total   │ 4391524.3389 ms   │
├──────────────────────────┼─────────┼───────────────────┤
│ Total Requests           │ total   │ 2800              │
├──────────────────────────┼─────────┼───────────────────┤
│ Failed Requests          │ total   │ 0                 │
├──────────────────────────┼─────────┼───────────────────┤
│ Success Requests         │ total   │ 2800              │
├──────────────────────────┼─────────┼───────────────────┤
│ Concurrency              │ total   │ 244.8903          │
├──────────────────────────┼─────────┼───────────────────┤
│ Max Concurrency          │ total   │ 256               │
├──────────────────────────┼─────────┼───────────────────┤
│ Request Throughput       │ total   │ 0.6376 req/s      │
├──────────────────────────┼─────────┼───────────────────┤
│ Total Input Tokens       │ total   │ 10232062          │
├──────────────────────────┼─────────┼───────────────────┤
│ Prefill Token Throughput │ total   │ 22.924 token/s    │
├──────────────────────────┼─────────┼───────────────────┤
│ Total generated tokens   │ total   │ 4200000           │
├──────────────────────────┼─────────┼───────────────────┤
│ Input Token Throughput   │ total   │ 2329.9568 token/s │
├──────────────────────────┼─────────┼───────────────────┤
│ Output Token Throughput  │ total   │ 956.3877 token/s  │
├──────────────────────────┼─────────┼───────────────────┤
│ Total Token Throughput   │ total   │ 3286.3445 token/s │
╘══════════════════════════╧═════════╧═══════════════════╛
```
- vLLM version: release/v0.13.0
- vLLM main:
254f6b9867

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>

This commit is contained in:

Nengjun Ma

2025-12-26 23:46:09 +08:00

committed by

GitHub

parent 1d8aa892bf

commit f5af6bbd1e

2 changed files with 75 additions and 0 deletions

									
										3

.github/workflows/nightly_test_a2.yaml
									
										vendored
									
												View File
												
				@@ -78,6 +78,9 @@ jobs:

				          - name: multi-node-deepseek-dp

				            config_file_path: DeepSeek-R1-W8A8-A2.yaml

				            size: 2

				          - name: multi-node-qwen3-235b-dp

				            config_file_path: Qwen3-235B-A22B-A2.yaml

				            size: 2

				    uses: ./.github/workflows/_e2e_nightly_multi_node.yaml

				    with:

				      soc_version: a2

									
										72

tests/e2e/nightly/multi_node/config/models/Qwen3-235B-A22B-A2.yaml
									
										Normal file
									
												View File
												
				@@ -0,0 +1,72 @@

				test_name: "test Qwen3-235B-A22B multi-dp on A2"

				model: "Qwen/Qwen3-235B-A22B"

				num_nodes: 2

				npu_per_node: 8

				env_common:

				  VLLM_USE_MODELSCOPE: true

				  OMP_PROC_BIND: false

				  OMP_NUM_THREADS: 1

				  HCCL_BUFFSIZE: 1024

				  SERVER_PORT: 8080

				  NUMEXPR_MAX_THREADS: 128

				  TASK_QUEUE_ENABLE: 1

				  PYTORCH_NPU_ALLOC_CONF: expandable_segments:True

				deployment:

				  -

				    server_cmd: >

				        vllm serve "Qwen/Qwen3-235B-A22B"

				        --host 0.0.0.0

				        --port $SERVER_PORT

				        --data-parallel-size 2

				        --data-parallel-size-local 1

				        --data-parallel-address $LOCAL_IP

				        --data-parallel-rpc-port 13389

				        --tensor-parallel-size 8

				        --seed 1024

				        --enable-expert-parallel

				        --max-num-seqs 128

				        --max-model-len 40960

				        --max-num-batched-tokens 256

				        --trust-remote-code

				        --gpu-memory-utilization 0.9

				        --async-scheduling

				  -

				    server_cmd: >

				        vllm serve "Qwen/Qwen3-235B-A22B"

				        --headless

				        --data-parallel-size 2

				        --data-parallel-size-local 1

				        --data-parallel-start-rank 1

				        --data-parallel-address $MASTER_IP

				        --data-parallel-rpc-port 13389

				        --tensor-parallel-size 8

				        --seed 1024

				        --max-num-seqs 128

				        --max-model-len 40960

				        --max-num-batched-tokens 256

				        --enable-expert-parallel

				        --trust-remote-code

				        --gpu-memory-utilization 0.9

				        --async-scheduling

				benchmarks:

				  perf:

				    case_type: performance

				    dataset_path: vllm-ascend/GSM8K-in3500-bs2800

				    request_conf: vllm_api_stream_chat

				    dataset_conf: gsm8k/gsm8k_gen_0_shot_cot_str_perf

				    num_prompts: 2800

				    max_out_len: 1500

				    batch_size: 256

				    request_rate: 4.8

				    baseline: 1

				    threshold: 0.97

				  acc:

				    case_type: accuracy

				    dataset_path: vllm-ascend/gsm8k-lite

				    request_conf: vllm_api_general_chat

				    dataset_conf: gsm8k/gsm8k_gen_0_shot_cot_chat_prompt

				    max_out_len: 7680

				    batch_size: 256

				    baseline: 96

				    threshold: 5

[CI] Add qwen-235b-a22b a2 multi-node test (#5393)

3 .github/workflows/nightly_test_a2.yaml vendored Unescape Escape View File

72 tests/e2e/nightly/multi_node/config/models/Qwen3-235B-A22B-A2.yaml Normal file Unescape Escape View File

3

.github/workflows/nightly_test_a2.yaml vendored

View File

72

tests/e2e/nightly/multi_node/config/models/Qwen3-235B-A22B-A2.yaml Normal file

View File