[TEST]Update prefixcache perf threshold for qwen3-32b-int8 (#4220)

### What this PR does / why we need it? This PR update the prefixcache threshold for qwen3-32b-int from 0.4 to 0.8, as the baseline has been improved. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the test - vLLM version: v0.11.0 - vLLM main: 2918c1b49c Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
2025-11-17 19:06:54 +08:00
parent e38ef2c434
commit 9a1cfb48d4
1 changed files with 2 additions and 2 deletions
--- a/tests/e2e/nightly/features/test_prefix_cache_qwen3_32b_int8.py
+++ b/tests/e2e/nightly/features/test_prefix_cache_qwen3_32b_int8.py
@@ -98,7 +98,7 @@ async def test_models(model: str) -> None:
        run_aisbench_cases(model, port, aisbench_warm_up)
        result = run_aisbench_cases(model, port, aisbench_cases75)
        TTFT75 = get_TTFT(result)
-    assert TTFT75 < 0.4 * TTFT0, f"The TTFT for prefix75 {TTFT75} is not less than 0.4*TTFT for prefix0 {TTFT0}."
+    assert TTFT75 < 0.8 * TTFT0, f"The TTFT for prefix75 {TTFT75} is not less than 0.8*TTFT for prefix0 {TTFT0}."
    print(
-        f"The TTFT for prefix75 {TTFT75} is less than 0.4*TTFT for prefix0 {TTFT0}."
+        f"The TTFT for prefix75 {TTFT75} is less than 0.8*TTFT for prefix0 {TTFT0}."
    )