[KV-Sharing] Support KV-Sharing feature in CLA models (#4138)

### What this PR does / why we need it?
Support KV-Sharing feature in CLA (cross layer attention) models, which
sharing kv cache in some layers.

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
---------
Signed-off-by: MengqingCao <cmq0113@163.com>
This commit is contained in:
Mengqing Cao
2025-12-23 10:48:31 +08:00
committed by GitHub
parent 9a79cbaecb
commit 449f8f65a7
5 changed files with 105 additions and 19 deletions

View File

@@ -22,4 +22,5 @@ msserviceprofiler>=1.2.2
mindstudio-probe>=8.3.0
arctic-inference==0.1.1
xlite
uc-manager
uc-manager
timm