[bugfix](cp) align max_context_chunk to cp_virtual_block_size (#5767)
### What this PR does / why we need it?
In the chunked prefill scenario, CP needs to align the
`max_context_chunk` to the `cp_virtual_block_size`, but the current
implementation only aligns it to the `block_size`. For
PD-disaggregation, `cp_kv_cache_interleave_size` is typically set equal
to `block_size`, in which case `cp_virtual_block_size=block_size *
dcp_size * pcp_size`. Under specific conditions, this can lead to
misalignment of certain chunks, subsequently triggering assertion check
errors.
### Does this PR introduce _any_ user-facing change?
No
- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
This commit is contained in:
@@ -69,6 +69,9 @@ class AscendMlaCPMetadataBuilder(AscendMLAMetadataBuilder):
|
|||||||
self.decode_threshold,
|
self.decode_threshold,
|
||||||
dtype=torch.uint8,
|
dtype=torch.uint8,
|
||||||
device=device)
|
device=device)
|
||||||
|
self.block_size = (self.block_size *
|
||||||
|
self.cp_virtual_block_size) // np.gcd(
|
||||||
|
self.block_size, self.cp_virtual_block_size)
|
||||||
|
|
||||||
def build(
|
def build(
|
||||||
self,
|
self,
|
||||||
|
|||||||
Reference in New Issue
Block a user