unify logic between aclgraph and torchair (#3560)

### What this PR does / why we need it?
unify logic between aclgraph and torchair for mtp spec decode

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: zouyida2052 <zouyida2002@gmail.com>
This commit is contained in:
zouyida2052
2025-10-22 21:52:57 +08:00
committed by GitHub
parent 1ad7ffd647
commit 55a4b5ac40

View File

@@ -502,7 +502,7 @@ class MtpProposer(Proposer):
# prepare next mtp inputs
# mtp>1: prefill skip or decode skip last loop
if with_prefill and self.torchair_graph_enabled:
if with_prefill:
for _ in range(self.num_speculative_tokens - 1):
draft_token_ids_list.append(draft_token_ids)
if step == self.num_speculative_tokens - 1 or with_prefill: