xc-llm-ascend

Files

pichangping 711f1861e4 MLA prefill preformance optimization (#5275 )

### What this PR does / why we need it?
Since the _npu_ring_mla operator deteriorates in long-sequencescenarios,
the long sequence is split into shorter sequences for input to improve
performance.
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?

- vLLM version: release/v0.13.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: pichangping <1337510399@qq.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>

2025-12-27 09:19:45 +08:00

test_attention_cp.py

[Perf] vectorize PCP/DCP loops in attention_cp.py (#4944 )

2025-12-22 11:06:19 +08:00

test_attention_mask.py

[Refactor] 2/N Unify all mask generation methods and cache mask (#4779 )

2025-12-09 18:51:00 +08:00

test_attention_v1.py

[Refactor] move the metadata from attention_v1 to util(ready for extract common_cp) & realize Ascendmetadata inherit from the parent class. (#5203 )

2025-12-23 00:10:52 +08:00

test_mla_cp.py

MLA prefill preformance optimization (#5275 )