[feature]dcp&pcp support mlapo (#5672)

### What this PR does / why we need it? mlapo in deepseek is a huge performance improvement in decode, this pr support pcp & dcp with mlapo ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: 2f4e6548ef --------- Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>
2026-01-08 23:49:23 +08:00
parent 6315a31399
commit 97f6be8108
7 changed files with 35 additions and 10 deletions
--- a/vllm_ascend/worker/model_runner_v1.py
+++ b/vllm_ascend/worker/model_runner_v1.py
@@ -912,12 +912,15 @@ class NPUModelRunner(GPUModelRunner):
                    self.input_batch)
                blk_table.slot_mapping.gpu[maybe_pcp_full_tokens:].fill_(-1)
                if self.pcp_size > 1:
-                    slot_mapping = self.pcp_manager.get_padded_slot_mapping(
+                    slot_mapping_pcp = self.pcp_manager.get_padded_slot_mapping(
                        total_num_scheduled_tokens,
                        slot_mapping,
                    )
                    blk_table.slot_mapping.gpu[:self.pcp_manager.
-                                               num_actual_tokens_pcp_padded] = slot_mapping
+                                               num_actual_tokens_pcp_padded] = slot_mapping_pcp
+                    slot_mapping = blk_table.slot_mapping.gpu[:self.
+                                                              pcp_manager.
+                                                              num_actual_tokens_pcp_padded]

            # NOTE: This is a temporary hack, now in GPUModelRunner, this prepare_inputs
            # has been split to multiple parts, and there are 3 parts that is related to this