[bugfix (pcp)] fix chunked prefill accurancy issue (#5647)
### What this PR does / why we need it?
Purpose: initialize padded slot mapping buffer to prevent garbage
values.
In PCP mode, the `pcp_padded_slot_mapping` buffer is reused across
invocations. Without explicit initialization, this buffer retain stale
values from previous runs, which can lead to incorrect results.
This change ensures the buffer is filled with -1.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef
---------
Signed-off-by: F.Liu <liufeng248@huawei.com>
Co-authored-by: F.Liu <liufeng248@huawei.com>
This commit is contained in:
@@ -319,6 +319,7 @@ class PCPManager:
|
|||||||
pcp_world_size]
|
pcp_world_size]
|
||||||
cp_unpad_mask = self.pcp_unpad_mask_cpu_tensor[:num_tokens *
|
cp_unpad_mask = self.pcp_unpad_mask_cpu_tensor[:num_tokens *
|
||||||
self.pcp_world_size]
|
self.pcp_world_size]
|
||||||
|
pcp_padded_slot_mapping.fill_(-1)
|
||||||
pcp_padded_slot_mapping[cp_unpad_mask] = slot_mapping
|
pcp_padded_slot_mapping[cp_unpad_mask] = slot_mapping
|
||||||
return pcp_padded_slot_mapping
|
return pcp_padded_slot_mapping
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user