[KV-Sharing] Support KV-Sharing feature in CLA models (#4138)

### What this PR does / why we need it?
Support KV-Sharing feature in CLA (cross layer attention) models, which
sharing kv cache in some layers.

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
---------
Signed-off-by: MengqingCao <cmq0113@163.com>
This commit is contained in:
Mengqing Cao
2025-12-23 10:48:31 +08:00
committed by GitHub
parent 9a79cbaecb
commit 449f8f65a7
5 changed files with 105 additions and 19 deletions

View File

@@ -105,6 +105,7 @@ jobs:
pytest -sv --durations=0 tests/e2e/singlecard/test_xlite.py
pytest -sv --durations=0 tests/e2e/singlecard/pooling/
pytest -sv --durations=0 tests/e2e/singlecard/compile/test_norm_quant_fusion.py
pytest -sv --durations=0 tests/e2e/singlecard/test_cross_layer_attn_model.py
# ------------------------------------ v1 spec decode test ------------------------------------ #
pytest -sv --durations=0 tests/e2e/singlecard/spec_decode_v1/test_v1_mtp_correctness.py