bugfix for mtp (#3300)

### What this PR does / why we need it?
when mtp>1, we need refresh cos ans sin in each step.

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?

- vLLM version: v0.11.0

Signed-off-by: zouyida2052 <zouyida2002@gmail.com>
This commit is contained in:
zouyida2052
2025-10-09 19:22:46 +08:00
committed by GitHub
parent 30c5d947c3
commit 81aff9c555
2 changed files with 13 additions and 1 deletions

View File

@@ -97,7 +97,7 @@ def split_decodes_and_prefills(
return num_reqs, 0, num_tokens, 0
first_prefill = is_prefill.int().argmax(dim=-1).item()
assert torch.all(query_lens[first_prefill:] >= decode_threshold)
assert torch.all(query_lens[first_prefill:] > decode_threshold)
assert torch.all(query_lens[:first_prefill] <= decode_threshold)
num_decodes = first_prefill
num_prefills = num_reqs - num_decodes