xc-llm-ascend

Files

Bai Yongbin 7f91ac2649 [CP&SP] Integrate FIA operator in mla_cp._forward_decode (#5641 )

### What this PR does / why we need it?
Replace the npu_multi_head_latent_attention with FIA operator in
mla_cp.py _forward_decode.
Adjust mla_attn_dpc_pcp in acl_graph.py

### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef

---------

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>
Signed-off-by: Bai Yongbin <845473182@qq.com>
Signed-off-by: tongyuzhou <t00886357@china.huawei.com>
Co-authored-by: 白永斌 <baiyongbin3@h-partners.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: tongyuzhou <t00886357@china.huawei.com>

2026-01-22 20:02:30 +08:00

test_attention_cp.py

[Main2Main] Upgrade vllm commit to 0109 (#5752 )

2026-01-13 19:14:43 +08:00

test_attention_mask.py

[Refactor] 2/N Unify all mask generation methods and cache mask (#4779 )

2025-12-09 18:51:00 +08:00

test_attention_v1.py

[Main2Main] Upgrade vllm commit to 0113 (#5839 )

2026-01-15 09:48:53 +08:00

test_mla_cp.py

[CP&SP] Integrate FIA operator in mla_cp._forward_decode (#5641 )