xc-llm-ascend

Files

weijinqian0 dbe4c338f2 [Refactor] cache cos/sin in mla & remove parameter model in builder. (#5277 )

RFC: https://github.com/vllm-project/vllm-ascend/issues/4629

1. Cache cos/sin in mla
2. AttentionBuilder inherits from the original class of vllm.



version: release/v0.13.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Co-authored-by: weijinqian_v1 <weijinqian@huawei.com>

2025-12-28 10:35:07 +08:00

test_attention_cp.py

[Perf] vectorize PCP/DCP loops in attention_cp.py (#4944 )

2025-12-22 11:06:19 +08:00

test_attention_mask.py

[Refactor] 2/N Unify all mask generation methods and cache mask (#4779 )

2025-12-09 18:51:00 +08:00

test_attention_v1.py

[Refactor] move the metadata from attention_v1 to util(ready for extract common_cp) & realize Ascendmetadata inherit from the parent class. (#5203 )

2025-12-23 00:10:52 +08:00

test_mla_cp.py

Revert "MLA prefill preformance optimization (#5275 )" (#5410 )