[BugFix]fix qwen3.5 reshape_kvcache bug (#7209)

### What this PR does / why we need it?

This PR fixes a bug in `reshape_kvcache_tensors` when reshaping the
Mamba cache for models like Qwen3.5. The previous implementation did not
correctly handle cases where the KV cache tensors have different data
types. This change ensures that slicing is performed based on byte
offsets before reshaping the tensors, which correctly handles
heterogeneous dtypes.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

By CI.

- vLLM version: v0.16.0
- vLLM main:
4034c3d32e

Signed-off-by: nwpu-zxr <zhouxuerong2@huawei.com>
This commit is contained in:
zxr2333
2026-03-12 23:51:40 +08:00
committed by GitHub
parent 5fe7942bbd
commit fe4cad24e9

View File

@@ -2852,8 +2852,8 @@ class NPUModelRunner(GPUModelRunner):
# a conv state in some special models.
target_shape = (num_blocks, *shape)
target_idx += torch.prod(torch.tensor(target_shape)).item()
tensor = raw_tensor.view(dtype)[start_idx:target_idx].view(target_shape)
target_idx += math.prod(target_shape) * get_dtype_size(dtype)
tensor = raw_tensor[start_idx:target_idx].view(dtype).view(target_shape)
start_idx = target_idx
state_tensors.append(tensor)
kv_caches[layer_name] = state_tensors