xc-llm-ascend

Files

Mengqing Cao 449f8f65a7 [KV-Sharing] Support KV-Sharing feature in CLA models (#4138 )

### What this PR does / why we need it?
Support KV-Sharing feature in CLA (cross layer attention) models, which
sharing kv cache in some layers.

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
---------
Signed-off-by: MengqingCao <cmq0113@163.com>

2025-12-23 10:48:31 +08:00

__init__.py

[Core] Make V1 work and enable V1 engine test (#389 )

2025-03-28 19:34:23 +08:00

attention_cp.py

[Refactor] move the metadata from attention_v1 to util(ready for extract common_cp) & realize Ascendmetadata inherit from the parent class. (#5203 )

2025-12-23 00:10:52 +08:00

attention_mask.py

[Model] Support pooling models (#3122 )

2025-12-10 11:37:57 +08:00

attention_v1.py

[KV-Sharing] Support KV-Sharing feature in CLA models (#4138 )

2025-12-23 10:48:31 +08:00

mla_cp.py

[feature] support pcp + mtp in full graph (#4572 )

2025-12-22 16:13:39 +08:00

mla_v1.py

[Refactor] remove some metadata variables in attention_v1. (#5160 )

2025-12-19 14:57:09 +08:00

sfa_v1.py

[misc][FlashComm1][ACLGraph] Incompatibility between Flashcomm1 and FULL_DECODE_ONLY. (#5200 )

2025-12-22 14:33:32 +08:00

utils.py

[Refactor] move the metadata from attention_v1 to util(ready for extract common_cp) & realize Ascendmetadata inherit from the parent class. (#5203 )

2025-12-23 00:10:52 +08:00