[BugFix][MTP] Fix prefill misclassified as decode when prompt tokens == num_spec_tokens + 1 (#6835)

## Problem When MTP is enabled, prefill requests with `prompt_tokens == num_spec_tokens + 1` are incorrectly classified as decode requests, causing accuracy issues. ## Root Cause The `uniform_decode` condition only checked: - `max_num_scheduled_tokens == uniform_decode_query_len` - `num_tokens == max_num_scheduled_tokens * num_reqs` This is insufficient because a prefill request with specific prompt length satisfies these conditions as well. ## Fix Add `is_all_decode` check to ensure all requests have `num_computed_tokens > 0` before classifying as uniform decode, since decode requests must have computed at least one token. - vLLM version: v0.15.0 - vLLM main: 83b47f67b1 --------- Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
2026-03-05 17:33:10 +08:00
parent 91c39ebae6
commit 50441e4650
2 changed files with 4 additions and 1 deletions
--- a/tests/e2e/multicard/4-cards/spec_decode/test_mtp_qwen3_next.py
+++ b/tests/e2e/multicard/4-cards/spec_decode/test_mtp_qwen3_next.py
@@ -100,6 +100,7 @@ def test_qwen3_next_mtp_correctness_tp4(model_name: str,
        "The president of the United States is",
        "The capital of France is",
        "The future of AI is",
+        "Who are you?",
    ]

    max_tokens = 20