[Misc] Refactor aclgraph accuracy test to use logprob-based comparison (#7455)
### What this PR does / why we need it?
Replace text-match assertions with a two-tier logprob accuracy check:
- Prefill (token 0): assert token ID is identical between eager baseline
and compiled mode, then verify logprob matches within `atol`.
- Decode (tokens 1-2): if chosen tokens match, compare logprobs
directly; if they differ, cross-lookup the baseline token in the
compiled model's top-20 distribution and assert the assigned logprob is
within `decode_atol` (defaults to 2x atol). This tolerates minor argmax
drift caused by floating-point differences while still catching
distribution divergence.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.17.0
- vLLM main:
8a680463fa
---------
Signed-off-by: wangli <wangli858794774@gmail.com>
This commit is contained in:
2
.github/workflows/scripts/config.yaml
vendored
2
.github/workflows/scripts/config.yaml
vendored
@@ -5,6 +5,8 @@ e2e-singlecard:
|
||||
estimated_time: 69
|
||||
- name: tests/e2e/singlecard/test_auto_fit_max_mode_len.py
|
||||
estimated_time: 70
|
||||
- name: tests/e2e/singlecard/test_eager_mode_acc.py
|
||||
estimated_time: 255
|
||||
- name: tests/e2e/singlecard/test_aclgraph_accuracy.py
|
||||
estimated_time: 839
|
||||
- name: tests/e2e/singlecard/test_aclgraph_batch_invariant.py
|
||||
|
||||
Reference in New Issue
Block a user