[Misc] Refactor aclgraph accuracy test to use logprob-based comparison (#7455)

### What this PR does / why we need it? Replace text-match assertions with a two-tier logprob accuracy check: - Prefill (token 0): assert token ID is identical between eager baseline and compiled mode, then verify logprob matches within `atol`. - Decode (tokens 1-2): if chosen tokens match, compare logprobs directly; if they differ, cross-lookup the baseline token in the compiled model's top-20 distribution and assert the assigned logprob is within `decode_atol` (defaults to 2x atol). This tolerates minor argmax drift caused by floating-point differences while still catching distribution divergence. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.17.0 - vLLM main: 8a680463fa --------- Signed-off-by: wangli <wangli858794774@gmail.com>
2026-03-23 09:08:21 +08:00
parent 9bf9b4b267
commit 75fae619d5
5 changed files with 228 additions and 145 deletions
--- a/.github/workflows/scripts/config.yaml
+++ b/.github/workflows/scripts/config.yaml
@@ -5,6 +5,8 @@ e2e-singlecard:
  estimated_time: 69
 - name: tests/e2e/singlecard/test_auto_fit_max_mode_len.py
  estimated_time: 70
+- name: tests/e2e/singlecard/test_eager_mode_acc.py
+  estimated_time: 255
 - name: tests/e2e/singlecard/test_aclgraph_accuracy.py
  estimated_time: 839
 - name: tests/e2e/singlecard/test_aclgraph_batch_invariant.py