[Fix] Synchronize the host query_start_loc with device values to prevent shape mismatches (#5134)
### What this PR does / why we need it?
Synchronize the host query_start_loc with device values to prevent shape
mismatches when not enable async scheduling.
### Does this PR introduce _any_ user-facing change?
None.
### How was this patch tested?
None.
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
---------
Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
This commit is contained in:
@@ -779,9 +779,9 @@ class MtpProposer(Proposer):
|
||||
hidden_states = torch.ops.vllm.maybe_pad_and_reduce(
|
||||
hidden_states)
|
||||
|
||||
if self.use_async_scheduling and attn_metadata[
|
||||
layer_name].decode is not None:
|
||||
for layer_name in self.attn_layer_name:
|
||||
for layer_name in self.attn_layer_name:
|
||||
if self.use_async_scheduling and attn_metadata[
|
||||
layer_name].decode is not None:
|
||||
actual_size = len(attn_metadata[layer_name].decode.
|
||||
actual_seq_lengths_q)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user