[bugfix (pcp)] fix chunked prefill accurancy issue (#5647)

### What this PR does / why we need it?
Purpose: initialize padded slot mapping buffer to prevent garbage
values.

In PCP mode, the `pcp_padded_slot_mapping` buffer is reused across
invocations. Without explicit initialization, this buffer retain stale
values from previous runs, which can lead to incorrect results.

This change ensures the buffer is filled with -1.

### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef

---------

Signed-off-by: F.Liu <liufeng248@huawei.com>
Co-authored-by: F.Liu <liufeng248@huawei.com>
This commit is contained in:
Feng Liu
2026-01-07 10:01:27 +08:00
committed by GitHub
parent 1112208052
commit cbc987db0b

View File

@@ -319,6 +319,7 @@ class PCPManager:
pcp_world_size]
cp_unpad_mask = self.pcp_unpad_mask_cpu_tensor[:num_tokens *
self.pcp_world_size]
pcp_padded_slot_mapping.fill_(-1)
pcp_padded_slot_mapping[cp_unpad_mask] = slot_mapping
return pcp_padded_slot_mapping