[HybridKV] Fix prefill disaggregation kvcache addr alignment & use hybrid kv cache only when running qwen3_next (#3007)
### What this PR does / why we need it?
This pr fixes a few issues on prefill disaggregation:
1. Fix prefill disaggregation kvcache addr alignment issue, llmdatadist
needs the addr of tensors to be aligned with 2M
2. Fix prefill disaggregation kvcache shape error, llmdatadist requires
k/v tensors with shape [num_blocks, ...], however the implentment before
this pr is [2, num_blocks, ...], which will break prefill disaggregation
3. Use hybrid kv cache only when running qwen3_next to fix accuracy
issue on prefill disaggregation.
### Does this PR introduce _any_ user-facing change?
N/A
### How was this patch tested?
Tested locally by @liziyu179
- vLLM version: v0.10.2
- vLLM main:
4f02b77de4
---------
Signed-off-by: MengqingCao <cmq0113@163.com>
This commit is contained in:
@@ -167,8 +167,7 @@ class BlockTable:
|
||||
mask, slot_mapping, -1)
|
||||
else:
|
||||
assert self.kernel_sizes is not None
|
||||
if self.block_size == self.kernel_sizes[0] or self.kernel_sizes[
|
||||
0] == 0:
|
||||
if self.block_size == self.kernel_sizes[0]:
|
||||
# IMPORTANT: In hybrid mode, positions are in logical block space,
|
||||
# but we need to map them to the correct logical block table indices
|
||||
logical_block_idx = positions // self.block_size
|
||||
|
||||
Reference in New Issue
Block a user