[KV-Sharing] Support KV-Sharing feature in CLA models (#4138)
### What this PR does / why we need it?
Support KV-Sharing feature in CLA (cross layer attention) models, which
sharing kv cache in some layers.
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
---------
Signed-off-by: MengqingCao <cmq0113@163.com>
This commit is contained in:
@@ -22,4 +22,5 @@ msserviceprofiler>=1.2.2
|
||||
mindstudio-probe>=8.3.0
|
||||
arctic-inference==0.1.1
|
||||
xlite
|
||||
uc-manager
|
||||
uc-manager
|
||||
timm
|
||||
|
||||
Reference in New Issue
Block a user