[BugFix]fix qwen3.5 reshape_kvcache bug (#7209)

### What this PR does / why we need it? This PR fixes a bug in `reshape_kvcache_tensors` when reshaping the Mamba cache for models like Qwen3.5. The previous implementation did not correctly handle cases where the KV cache tensors have different data types. This change ensures that slicing is performed based on byte offsets before reshaping the tensors, which correctly handles heterogeneous dtypes. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By CI. - vLLM version: v0.16.0 - vLLM main: 4034c3d32e Signed-off-by: nwpu-zxr <zhouxuerong2@huawei.com>
2026-03-12 23:51:40 +08:00
parent 5fe7942bbd
commit fe4cad24e9
1 changed files with 2 additions and 2 deletions
--- a/vllm_ascend/worker/model_runner_v1.py
+++ b/vllm_ascend/worker/model_runner_v1.py
@@ -2852,8 +2852,8 @@ class NPUModelRunner(GPUModelRunner):
                        # a conv state in some special models.
                        target_shape = (num_blocks, *shape)

-                        target_idx += torch.prod(torch.tensor(target_shape)).item()
-                        tensor = raw_tensor.view(dtype)[start_idx:target_idx].view(target_shape)
+                        target_idx += math.prod(target_shape) * get_dtype_size(dtype)
+                        tensor = raw_tensor[start_idx:target_idx].view(dtype).view(target_shape)
                        start_idx = target_idx
                        state_tensors.append(tensor)
                    kv_caches[layer_name] = state_tensors