[TEST]Update prefixcache perf threshold for qwen3-32b-int8 (#4220)
### What this PR does / why we need it?
This PR update the prefixcache threshold for qwen3-32b-int from 0.4 to
0.8, as the baseline has been improved.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By running the test
- vLLM version: v0.11.0
- vLLM main:
2918c1b49c
Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
This commit is contained in:
@@ -98,7 +98,7 @@ async def test_models(model: str) -> None:
|
||||
run_aisbench_cases(model, port, aisbench_warm_up)
|
||||
result = run_aisbench_cases(model, port, aisbench_cases75)
|
||||
TTFT75 = get_TTFT(result)
|
||||
assert TTFT75 < 0.4 * TTFT0, f"The TTFT for prefix75 {TTFT75} is not less than 0.4*TTFT for prefix0 {TTFT0}."
|
||||
assert TTFT75 < 0.8 * TTFT0, f"The TTFT for prefix75 {TTFT75} is not less than 0.8*TTFT for prefix0 {TTFT0}."
|
||||
print(
|
||||
f"The TTFT for prefix75 {TTFT75} is less than 0.4*TTFT for prefix0 {TTFT0}."
|
||||
f"The TTFT for prefix75 {TTFT75} is less than 0.8*TTFT for prefix0 {TTFT0}."
|
||||
)
|
||||
|
||||
Reference in New Issue
Block a user