[KV Pool]get_num_new_matched_tokens return 0 if token length < block_size (#7146)
### What this PR does / why we need it?
Currently, we call lookup_client for looking up token hit in KV Pool,
however, when token length < block size, the key will be empty and there
is no point to lookup in KV Pool backend since there will never be a
hit.
Hence, add early return in `get_num_new_matched_tokens` when `token_len`
< `block_size`
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.16.0
- vLLM main:
4034c3d32e
---------
Signed-off-by: Pz1116 <zpbzpb123123@gmail.com>
Co-authored-by: fems14 <1804143737@qq.com>
This commit is contained in:
@@ -78,6 +78,9 @@ class KVPoolScheduler:
|
|||||||
else:
|
else:
|
||||||
token_len = len(request.prompt_token_ids)
|
token_len = len(request.prompt_token_ids)
|
||||||
|
|
||||||
|
if token_len < self._block_size:
|
||||||
|
return 0, False
|
||||||
|
|
||||||
num_external_hit_tokens = self.client.lookup(token_len, request.block_hashes)
|
num_external_hit_tokens = self.client.lookup(token_len, request.block_hashes)
|
||||||
|
|
||||||
if num_external_hit_tokens == request.num_tokens:
|
if num_external_hit_tokens == request.num_tokens:
|
||||||
|
|||||||
Reference in New Issue
Block a user