[Nightly] Avoid max_model_len being smaller than the decoder prompt to prevent single-node-accuray-tests from failing (#5174)

### What this PR does / why we need it? [Nightly] Avoid max_model_len being smaller than the decoder prompt to prevent single-node-accuray-tests from failing ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: ad32e3e19c --------- Signed-off-by: ZT-AIA <1028681969@qq.com> Signed-off-by: ZT-AIA <63220130+ZT-AIA@users.noreply.github.com>
2025-12-18 22:25:45 +08:00
parent 632eab28b7
commit 6cb76ecd02
2 changed files with 2 additions and 0 deletions
--- a/tests/e2e/models/configs/Qwen3-VL-30B-A3B-Instruct.yaml
+++ b/tests/e2e/models/configs/Qwen3-VL-30B-A3B-Instruct.yaml
@@ -6,6 +6,7 @@ tasks:
  metrics:
  - name: "acc,none"
    value: 0.58
 max_model_len: 8192
 tensor_parallel_size: 2
 gpu_memory_utilization: 0.7
 enable_expert_parallel: True
--- a/tests/e2e/models/configs/Qwen3-VL-8B-Instruct.yaml
+++ b/tests/e2e/models/configs/Qwen3-VL-8B-Instruct.yaml
@@ -6,5 +6,6 @@ tasks:
  metrics:
  - name: "acc,none"
    value: 0.55
 max_model_len: 8192
 batch_size: 32
 gpu_memory_utilization: 0.7