[Misc] Refactor aclgraph accuracy test to use logprob-based comparison (#7455)

### What this PR does / why we need it?

Replace text-match assertions with a two-tier logprob accuracy check:

- Prefill (token 0): assert token ID is identical between eager baseline
and compiled mode, then verify logprob matches within `atol`.
- Decode (tokens 1-2): if chosen tokens match, compare logprobs
directly; if they differ, cross-lookup the baseline token in the
compiled model's top-20 distribution and assert the assigned logprob is
within `decode_atol` (defaults to 2x atol). This tolerates minor argmax
drift caused by floating-point differences while still catching
distribution divergence.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.17.0
- vLLM main:
8a680463fa

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
This commit is contained in:
Li Wang
2026-03-23 09:08:21 +08:00
committed by GitHub
parent 9bf9b4b267
commit 75fae619d5
5 changed files with 228 additions and 145 deletions

View File

@@ -5,6 +5,8 @@ e2e-singlecard:
estimated_time: 69
- name: tests/e2e/singlecard/test_auto_fit_max_mode_len.py
estimated_time: 70
- name: tests/e2e/singlecard/test_eager_mode_acc.py
estimated_time: 255
- name: tests/e2e/singlecard/test_aclgraph_accuracy.py
estimated_time: 839
- name: tests/e2e/singlecard/test_aclgraph_batch_invariant.py