From 129ba9fe1bc44536917318b007c1778635ee1a34 Mon Sep 17 00:00:00 2001 From: dsxsteven <36877507+dsxsteven@users.noreply.github.com> Date: Mon, 5 Jan 2026 22:40:28 +0800 Subject: [PATCH] [BugFix] Fix Smoke Testing Bug for DSR1 longseq (#5613) ### What this PR does / why we need it? Fix Smoke Testing Bug for DSR1 longseq We need to make this change because the daily smoke test case is throwing an error: "max_tokens or max_completion_tokens is too large: 32768.This model's maximum context length is 32768 tokens and your request has 128 input tokens". We encounter this error due to max-out-len equals to max-model-len. We can fix this error by increasing max-model-len argument in the script. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/7157596103666ee7ccb7008acee8bff8a8ff1731 Signed-off-by: daishixun --- .../nightly/multi_node/config/DeepSeek-R1-W8A8-longseq.yaml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tests/e2e/nightly/multi_node/config/DeepSeek-R1-W8A8-longseq.yaml b/tests/e2e/nightly/multi_node/config/DeepSeek-R1-W8A8-longseq.yaml index bc88aaaa..e6bbd7ae 100644 --- a/tests/e2e/nightly/multi_node/config/DeepSeek-R1-W8A8-longseq.yaml +++ b/tests/e2e/nightly/multi_node/config/DeepSeek-R1-W8A8-longseq.yaml @@ -34,7 +34,7 @@ deployment: --seed 1024 --quantization ascend --max-num-seqs 4 - --max-model-len 32768 + --max-model-len 36864 --max-num-batched-tokens 16384 --trust-remote-code --gpu-memory-utilization 0.9 @@ -72,7 +72,7 @@ deployment: --seed 1024 --quantization ascend --max-num-seqs 4 - --max-model-len 32768 + --max-model-len 36864 --max-num-batched-tokens 256 --trust-remote-code --gpu-memory-utilization 0.9