[Fix] Synchronize the host query_start_loc with device values to prevent shape mismatches (#5134)

### What this PR does / why we need it? Synchronize the host query_start_loc with device values to prevent shape mismatches when not enable async scheduling. ### Does this PR introduce _any_ user-facing change? None. ### How was this patch tested? None. - vLLM version: v0.12.0 - vLLM main: ad32e3e19c --------- Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
2025-12-17 23:50:12 +08:00
parent 950570f8d1
commit 43d974c6f7
2 changed files with 9 additions and 12 deletions
--- a/vllm_ascend/spec_decode/mtp_proposer.py
+++ b/vllm_ascend/spec_decode/mtp_proposer.py
@@ -779,9 +779,9 @@ class MtpProposer(Proposer):
                        hidden_states = torch.ops.vllm.maybe_pad_and_reduce(
                            hidden_states)

-                    if self.use_async_scheduling and attn_metadata[
-                            layer_name].decode is not None:
-                        for layer_name in self.attn_layer_name:
+                    for layer_name in self.attn_layer_name:
+                        if self.use_async_scheduling and attn_metadata[
+                                layer_name].decode is not None:
                            actual_size = len(attn_metadata[layer_name].decode.
                                              actual_seq_lengths_q)