[HybridKV] Fix prefill disaggregation kvcache addr alignment & use hybrid kv cache only when running qwen3_next (#3007)

### What this PR does / why we need it? This pr fixes a few issues on prefill disaggregation: 1. Fix prefill disaggregation kvcache addr alignment issue, llmdatadist needs the addr of tensors to be aligned with 2M 2. Fix prefill disaggregation kvcache shape error, llmdatadist requires k/v tensors with shape [num_blocks, ...], however the implentment before this pr is [2, num_blocks, ...], which will break prefill disaggregation 3. Use hybrid kv cache only when running qwen3_next to fix accuracy issue on prefill disaggregation. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? Tested locally by @liziyu179 - vLLM version: v0.10.2 - vLLM main: 4f02b77de4 --------- Signed-off-by: MengqingCao <cmq0113@163.com>
2025-09-18 21:43:22 +08:00
parent acb46f303f
commit 367edff5af
3 changed files with 95 additions and 46 deletions
--- a/vllm_ascend/worker/block_table.py
+++ b/vllm_ascend/worker/block_table.py
@@ -167,8 +167,7 @@ class BlockTable:
                mask, slot_mapping, -1)
        else:
            assert self.kernel_sizes is not None
-            if self.block_size == self.kernel_sizes[0] or self.kernel_sizes[
-                    0] == 0:
+            if self.block_size == self.kernel_sizes[0]:
                # IMPORTANT: In hybrid mode, positions are in logical block space,
                # but we need to map them to the correct logical block table indices
                logical_block_idx = positions // self.block_size