Files
xc-llm-ascend/tests/ut/attention
pichangping 711f1861e4 MLA prefill preformance optimization (#5275)
### What this PR does / why we need it?
Since the _npu_ring_mla operator deteriorates in long-sequencescenarios,
the long sequence is split into shorter sequences for input to improve
performance.
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?

- vLLM version: release/v0.13.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: pichangping <1337510399@qq.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-12-27 09:19:45 +08:00
..