xc-llm-ascend

Files

weijinqian0 f668ff9ef0 [v0.18.0][BugFix]Revert the code: Replace npu_ring_mla wit FIA with MLA prefill. (#7961 )

This pull request reverts previous changes to switch to FIA and instead
implements npu_ring_mla for MLA prefill operations(#5704 ). The change
streamlines the attention mechanism by removing unnecessary metadata
tracking and updating the underlying NPU operations to use the
ring-based MLA kernel. This adjustment ensures better compatibility and
performance for MLA prefill tasks within the vLLM Ascend backend.

Highlights

- Migration to npu_ring_mla: Replaced the usage of
npu_fused_infer_attention_score (FIA) with npu_ring_mla for MLA prefill
operations across the codebase to improve performance and alignment with
the intended architecture.
- Cleanup of redundant metadata: Removed
chunk_actual_seq_lengths_kv_list and actual_seq_lengths_q from various
metadata structures as they are no longer required for the updated
attention implementation.
- Test suite updates: Updated unit tests in test_mla_cp.py and
test_mla_v1.py to mock npu_ring_mla instead of the deprecated FIA
functions and adjusted test assertions to reflect the new implementation
details.

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Co-authored-by: weijinqian_v1 <weijinqian@huawei.com>

2026-04-09 17:00:25 +08:00

e2e

[UT][v0.18.0] Fix APC nightly UT and TTFT ratio (cherry-pick #7468 ) (#8053 )

2026-04-08 21:08:26 +08:00

[v0.18.0][BugFix]Revert the code: Replace npu_ring_mla wit FIA with MLA prefill. (#7961 )

2026-04-09 17:00:25 +08:00

__init__.py

[SpecDecode] Add spec decode support (#500 )

2025-04-17 20:16:32 +08:00