[Refactor]5/N Extract common code of mla_v1.py & extract mla_cp (#5097)

RFC: https://github.com/vllm-project/vllm-ascend/issues/4629
Reason:
The functions related to Cp differ significantly from those of normal
MLA-Attention, but the coupling is quite severe.

Steps:
1)Extract common code AscendMLAMetadataBuilder.build to 4 functions: 
build_prefill_metadata, build_decode_metadata,build_cp_metadata,
build_chunked_metadata

todo:
1)refactor function _compute_prefill_context;
2)refactor function _mla_preprocess,_mla_decode_preprocess
3)Extract public data and processing functions from the attention_cp.py
and mla_cp.py files to the common_cp file.

vLLM version: 0.13.0rc3
vLLM main:
ad32e3e19c

- vLLM version: 0.13.0rc3
- vLLM main:
ad32e3e19c

---------

Signed-off-by: wujinyuan1 <wjy9595@qq.com>
Signed-off-by: wujinyuan1 <wujinyuan1@huawei.com>
Co-authored-by: wujinyuan1 <wjy9595@qq.com>
Co-authored-by: weijinqian0 <1184188277@qq.com>
This commit is contained in:
wujinyuan1
2025-12-24 10:25:19 +08:00
committed by GitHub
parent 2a2d527e96
commit 7ff1db4b84
6 changed files with 545 additions and 718 deletions

View File

@@ -87,7 +87,7 @@ class AscendPrefillContextParallelMetadata:
cp_kv_recover_idx_for_chunk: torch.Tensor = None
num_actual_tokens_pcp_padded: Optional[int] = None
num_actual_tokens_pcp_padded: int = 0
num_computed_tokens_of_pcp_dcp: Optional[list[list[list[int]]]] = None