[KV-Sharing] Support KV-Sharing feature in CLA models (#4138)

### What this PR does / why we need it? Support KV-Sharing feature in CLA (cross layer attention) models, which sharing kv cache in some layers. - vLLM version: v0.12.0 - vLLM main: ad32e3e19c --------- Signed-off-by: MengqingCao <cmq0113@163.com>
2025-12-23 10:48:31 +08:00
parent 9a79cbaecb
commit 449f8f65a7
5 changed files with 105 additions and 19 deletions
--- a/.github/workflows/_e2e_test.yaml
+++ b/.github/workflows/_e2e_test.yaml
@@ -105,6 +105,7 @@ jobs:
          pytest -sv --durations=0 tests/e2e/singlecard/test_xlite.py
          pytest -sv --durations=0 tests/e2e/singlecard/pooling/
          pytest -sv --durations=0 tests/e2e/singlecard/compile/test_norm_quant_fusion.py
+          pytest -sv --durations=0 tests/e2e/singlecard/test_cross_layer_attn_model.py

          # ------------------------------------ v1 spec decode test ------------------------------------ #
          pytest -sv --durations=0 tests/e2e/singlecard/spec_decode_v1/test_v1_mtp_correctness.py