xc-llm-ascend/Qwen3-VL-30B-A3B-Instruct.yaml at 137f28341d40b43dfa96ea47637521f8374fe4ae - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

ZT-AIA 6cb76ecd02 [Nightly] Avoid max_model_len being smaller than the decoder prompt to prevent single-node-accuray-tests from failing (#5174 )

### What this PR does / why we need it?
[Nightly] Avoid max_model_len being smaller than the decoder prompt to
prevent single-node-accuray-tests from failing
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: ZT-AIA <1028681969@qq.com>
Signed-off-by: ZT-AIA <63220130+ZT-AIA@users.noreply.github.com>

2025-12-18 22:25:45 +08:00

13 lines

266 B

YAML

Raw Blame History

 model_name: "Qwen/Qwen3-VL-30B-A3B-Instruct"
 hardware: "Atlas A2 Series"
 model: "vllm-vlm"
 tasks:
 - name: "mmmu_val"
   metrics:
   - name: "acc,none"
     value: 0.58
 max_model_len: 8192
 tensor_parallel_size: 2
 gpu_memory_utilization: 0.7
 enable_expert_parallel: True