[BugFix][HybridKV] Update the check logic of reinitializing inputbatch (#3540)
### What this PR does / why we need it? Update the check logic of reinitializing inputbatch, this is a follow-up pr of #3477. `kernel_block_sizes` is a `list[list[int]]` and the original logic will always update `InputBatch` when using hybrid blocks, this pr fixes that ### How was this patch tested? locally test with qwen3-next - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: MengqingCao <cmq0113@163.com>
This commit is contained in:
@@ -3147,7 +3147,7 @@ class NPUModelRunner(LoRAModelRunnerMixin):
|
|||||||
|
|
||||||
if block_sizes != [
|
if block_sizes != [
|
||||||
self.cache_config.block_size
|
self.cache_config.block_size
|
||||||
] or kernel_block_sizes != [self.cache_config.block_size]:
|
] or kernel_block_sizes != [[self.cache_config.block_size]]:
|
||||||
assert self.cache_config.cpu_offload_gb == 0, (
|
assert self.cache_config.cpu_offload_gb == 0, (
|
||||||
"Cannot re-initialize the input batch when CPU weight "
|
"Cannot re-initialize the input batch when CPU weight "
|
||||||
"offloading is enabled. See https://github.com/vllm-project/vllm/pull/18298 " # noqa: E501
|
"offloading is enabled. See https://github.com/vllm-project/vllm/pull/18298 " # noqa: E501
|
||||||
|
|||||||
Reference in New Issue
Block a user