[BugFix][HybridKV] Update the check logic of reinitializing inputbatch (#3540)

### What this PR does / why we need it?
Update the check logic of reinitializing inputbatch, this is a follow-up
pr of #3477. `kernel_block_sizes` is a `list[list[int]]` and the
original logic will always update `InputBatch` when using hybrid blocks,
this pr fixes that

### How was this patch tested?
locally test with qwen3-next
- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: MengqingCao <cmq0113@163.com>
This commit is contained in:
Mengqing Cao
2025-10-20 15:29:48 +08:00
committed by GitHub
parent daa4dd0a57
commit 918ded9155

View File

@@ -3147,7 +3147,7 @@ class NPUModelRunner(LoRAModelRunnerMixin):
if block_sizes != [
self.cache_config.block_size
] or kernel_block_sizes != [self.cache_config.block_size]:
] or kernel_block_sizes != [[self.cache_config.block_size]]:
assert self.cache_config.cpu_offload_gb == 0, (
"Cannot re-initialize the input batch when CPU weight "
"offloading is enabled. See https://github.com/vllm-project/vllm/pull/18298 " # noqa: E501