[KVPool]Fix PP get bug (#5007)
### What this PR does / why we need it?
When kv caches are evicted from the key-value pool, it's possible that
the kv cache for pp0 is still active, but the kv cache for pp1 has
already been evicted. Therefore, a unified check is needed during the
get operation.
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
Signed-off-by: baxingpiaochong <771405853@qq.com>
Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com>
This commit is contained in:
@@ -572,7 +572,8 @@ class KVPoolWorker:
|
||||
num_block = len(keys) // self.num_layers
|
||||
multi_tp_values = [
|
||||
res[i * num_block:(i + 1) * num_block] # type: ignore[index]
|
||||
for i in range(min(self.tp_size, self.num_kv_head))
|
||||
for i in range(
|
||||
min(self.tp_size, self.num_kv_head) * self.pp_size)
|
||||
]
|
||||
index = self.find_min_first_non_one_index(multi_tp_values)
|
||||
if index != -1:
|
||||
|
||||
Reference in New Issue
Block a user