xc-llm-ascend

Files

zxr2333 fe4cad24e9 [BugFix]fix qwen3.5 reshape_kvcache bug (#7209 )

### What this PR does / why we need it?

This PR fixes a bug in `reshape_kvcache_tensors` when reshaping the
Mamba cache for models like Qwen3.5. The previous implementation did not
correctly handle cases where the KV cache tensors have different data
types. This change ensures that slicing is performed based on byte
offsets before reshaping the tensors, which correctly handles
heterogeneous dtypes.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

By CI.

- vLLM version: v0.16.0
- vLLM main:
4034c3d32e

Signed-off-by: nwpu-zxr <zhouxuerong2@huawei.com>

2026-03-12 23:51:40 +08:00

[MODELRUNNERV2]fix penality ops (#7013 )

2026-03-11 17:13:34 +08:00

__init__.py

[Misc][V0 Deprecation] Remove Cache Engine Used for V0 Worker (#1878 )

2025-07-19 09:42:32 +08:00

block_table.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #7 ) (#6023 )

2026-02-06 14:56:53 +08:00

model_runner_v1.py

[BugFix]fix qwen3.5 reshape_kvcache bug (#7209 )

2026-03-12 23:51:40 +08:00