Files
xc-llm-ascend/vllm_ascend
pichangping 50e7934415 MLA prefill preformance optimization (#5456)
### What this PR does / why we need it?
Since the _npu_ring_mla operator deteriorates in long-sequencescenarios,
the long sequence is split into shorter sequences for input to improve
performance.

- vLLM version: v0.13.0
- vLLM main:
5326c89803

---------

Signed-off-by: pichangping <1337510399@qq.com>
2026-01-05 11:41:59 +08:00
..
2025-12-20 17:03:25 +08:00
2025-12-11 18:45:43 +08:00
2025-12-31 09:49:55 +08:00
2025-12-25 09:17:06 +08:00
2025-12-02 17:35:47 +08:00