[bugfix (pcp)] fix chunked prefill accurancy issue (#5647)
### What this PR does / why we need it?
Purpose: initialize padded slot mapping buffer to prevent garbage
values.
In PCP mode, the `pcp_padded_slot_mapping` buffer is reused across
invocations. Without explicit initialization, this buffer retain stale
values from previous runs, which can lead to incorrect results.
This change ensures the buffer is filled with -1.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef
---------
Signed-off-by: F.Liu <liufeng248@huawei.com>
Co-authored-by: F.Liu <liufeng248@huawei.com>
This commit is contained in:
@@ -319,6 +319,7 @@ class PCPManager:
|
||||
pcp_world_size]
|
||||
cp_unpad_mask = self.pcp_unpad_mask_cpu_tensor[:num_tokens *
|
||||
self.pcp_world_size]
|
||||
pcp_padded_slot_mapping.fill_(-1)
|
||||
pcp_padded_slot_mapping[cp_unpad_mask] = slot_mapping
|
||||
return pcp_padded_slot_mapping
|
||||
|
||||
|
||||
Reference in New Issue
Block a user