Add models test and add serval new models yaml (#3394)

### What this PR does / why we need it?
This PR added Add accuracy CI for servals new models
- `ascend test / accuracy` is for PR triggered check popluar models
accuracy
- `ascedn test / models` is for accuracy report, full models test,
nightly model test
- Add Qwen2-Audio-7B-Instruct, Qwen2-VL-7B-Instruct, Qwen3-8B,
Qwen3-VL-30B-A3B-Instruct

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Closes: https://github.com/vllm-project/vllm-ascend/pull/2330
Closes: https://github.com/vllm-project/vllm-ascend/pull/3362


- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
This commit is contained in:
Yikun Jiang
2025-10-12 17:27:50 +08:00
committed by GitHub
parent d05d29ff0e
commit cd69385dab
9 changed files with 434 additions and 285 deletions

View File

@@ -0,0 +1,10 @@
model_name: "Qwen/Qwen2-Audio-7B-Instruct"
hardware: "Atlas A2 Series"
tasks:
- name: "gsm8k"
metrics:
- name: "exact_match,strict-match"
value: 0.44
- name: "exact_match,flexible-extract"
value: 0.45
num_fewshot: 5

View File

@@ -0,0 +1,10 @@
model_name: "Qwen/Qwen2-VL-7B-Instruct"
hardware: "Atlas A2 Series"
model: "vllm-vlm"
tasks:
- name: "mmmu_val"
metrics:
- name: "acc,none"
value: 0.50
max_model_len: 8192
gpu_memory_utilization: 0.7

View File

@@ -0,0 +1,11 @@
model_name: "Qwen/Qwen3-8B"
hardware: "Atlas A2 Series"
tasks:
- name: "gsm8k"
metrics:
- name: "exact_match,strict-match"
value: 0.765
- name: "exact_match,flexible-extract"
value: 0.81
num_fewshot: 5
enable_thinking: False

View File

@@ -0,0 +1,12 @@
model_name: "Qwen/Qwen3-VL-30B-A3B-Instruct"
hardware: "Atlas A2 Series"
model: "vllm-vlm"
tasks:
- name: "mmmu_val"
metrics:
- name: "acc,none"
value: 0.58
max_model_len: 8192
tensor_parallel_size: 2
gpu_memory_utilization: 0.7
enable_expert_parallel: True

View File

@@ -1,4 +1,8 @@
DeepSeek-V2-Lite.yaml
Qwen3-8B-Base.yaml
Qwen2.5-VL-7B-Instruct.yaml
Qwen3-30B-A3B.yaml
Qwen3-30B-A3B.yaml
Qwen3-8B.yaml
Qwen2-7B.yaml
Qwen2-VL-7B-Instruct.yaml
Qwen2-Audio-7B-Instruct.yaml
Qwen3-VL-30B-A3B-Instruct.yaml

View File

@@ -7,7 +7,7 @@ import pytest
import yaml
from jinja2 import Environment, FileSystemLoader
RTOL = 0.03
RTOL = 0.05
TEST_DIR = os.path.dirname(__file__)
@@ -48,7 +48,7 @@ def build_model_args(eval_config, tp_size):
}
for s in [
"max_images", "gpu_memory_utilization", "enable_expert_parallel",
"tensor_parallel_size", "enforce_eager"
"tensor_parallel_size", "enforce_eager", "enable_thinking"
]:
val = eval_config.get(s, None)
if val is not None: