xc-llm-ascend

Files

weijinqian0 98e6e57622 [Refactor] 4/N Distinguish the branches based on the applicable scenarios of PA and FIA Ops. (#5081 )

RFC: https://github.com/vllm-project/vllm-ascend/issues/4629

Reason:

We distinguish the branches based on the applicable scenarios of
pagedAttention and fusedInferAttention, making the code more clear.

At the same time, it is convenient for the subsequent iterations of
sliding_window and sinks and removePA ops after FIA is ready.

Todo:

remove PA ops after FIA is ready
add slidingwindow and ops for gpt_oss
replace FIA with FIA_v2
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Co-authored-by: weijinqian_v1 <weijinqian@huawei.com>

2025-12-17 23:14:02 +08:00

test_attention_cp.py

[UT]add the UT of pcp and dcp in the attention_cp file (#5054 )

2025-12-17 09:11:33 +08:00

test_attention_mask.py

[Refactor] 2/N Unify all mask generation methods and cache mask (#4779 )

2025-12-09 18:51:00 +08:00

test_attention_v1.py

[Refactor] 4/N Distinguish the branches based on the applicable scenarios of PA and FIA Ops. (#5081 )

2025-12-17 23:14:02 +08:00

test_mla_cp.py

[UT] add pcp&dcp UT for mla_cp (#4953 )