[bugfix](CP,MLA) fix wrong slot_mapping of decode for mixed p/d batch (#6344)

### What this PR does / why we need it? PR #5672 attempted to remove the -1 padding for duplicate tokens in the decode slot_mapping when adapting PCP for MLAPO, and adopted a simpler slicing approach. However, in the single-ops logic and mixed PD batches, the decode slot_mapping did not eliminate the -1 and also shared the slicing method, resulting in incorrect slot_mapping. This PR resolves this issue, and the logic will be further consolidated in subsequent refactoring PRs. - vLLM version: v0.14.1 - vLLM main: dc917cceb8 --------- Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
2026-01-29 16:48:37 +08:00
parent 6a7b3bc29c
commit 50e0e87646
2 changed files with 3 additions and 1 deletions
--- a/vllm_ascend/attention/context_parallel/mla_cp.py
+++ b/vllm_ascend/attention/context_parallel/mla_cp.py
@@ -79,7 +79,7 @@ class AscendMlaCPMetadataBuilder(AscendMLAMetadataBuilder):
        fast_build: bool = False,
    ) -> AscendMLAMetadata:
        metadata_cls = super().build(common_prefix_len, common_attn_metadata)
-        if self.num_prefills == 0 and self.pcp_size > 1:
+        if self.pcp_size > 1:
            self.slot_mapping[: self.num_decode_tokens] = self.slot_mapping[
                : self.num_decode_tokens * self.pcp_size : self.pcp_size
            ]