xc-llm-ascend

Files

pichangping 50e7934415 MLA prefill preformance optimization (#5456 )

### What this PR does / why we need it?
Since the _npu_ring_mla operator deteriorates in long-sequencescenarios,
the long sequence is split into shorter sequences for input to improve
performance.

- vLLM version: v0.13.0
- vLLM main:
5326c89803

---------

Signed-off-by: pichangping <1337510399@qq.com>

2026-01-05 11:41:59 +08:00

test_pcp_manager.py

MLA prefill preformance optimization (#5456 )

2026-01-05 11:41:59 +08:00

test_worker_v1.py

Revert "moe_gating_top_k" (#5512 )

2025-12-30 15:05:47 +08:00