[TEST]Update prefixcache perf threshold for qwen3-32b-int8 (#4220)

### What this PR does / why we need it?
This PR update the prefixcache threshold for qwen3-32b-int from 0.4 to
0.8, as the baseline has been improved.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By running the test
- vLLM version: v0.11.0
- vLLM main:
2918c1b49c

Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
This commit is contained in:
jiangyunfan1
2025-11-17 19:06:54 +08:00
committed by GitHub
parent e38ef2c434
commit 9a1cfb48d4

View File

@@ -98,7 +98,7 @@ async def test_models(model: str) -> None:
run_aisbench_cases(model, port, aisbench_warm_up)
result = run_aisbench_cases(model, port, aisbench_cases75)
TTFT75 = get_TTFT(result)
assert TTFT75 < 0.4 * TTFT0, f"The TTFT for prefix75 {TTFT75} is not less than 0.4*TTFT for prefix0 {TTFT0}."
assert TTFT75 < 0.8 * TTFT0, f"The TTFT for prefix75 {TTFT75} is not less than 0.8*TTFT for prefix0 {TTFT0}."
print(
f"The TTFT for prefix75 {TTFT75} is less than 0.4*TTFT for prefix0 {TTFT0}."
f"The TTFT for prefix75 {TTFT75} is less than 0.8*TTFT for prefix0 {TTFT0}."
)