[bugfix](cp) align max_context_chunk to cp_virtual_block_size (#5767)

### What this PR does / why we need it? In the chunked prefill scenario, CP needs to align the `max_context_chunk` to the `cp_virtual_block_size`, but the current implementation only aligns it to the `block_size`. For PD-disaggregation, `cp_kv_cache_interleave_size` is typically set equal to `block_size`, in which case `cp_virtual_block_size=block_size * dcp_size * pcp_size`. Under specific conditions, this can lead to misalignment of certain chunks, subsequently triggering assertion check errors. ### Does this PR introduce _any_ user-facing change? No - vLLM version: v0.13.0 - vLLM main: 2f4e6548ef Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
2026-01-12 20:11:46 +08:00
parent 4453c60262
commit 5f4b13ab3d
1 changed files with 3 additions and 0 deletions
--- a/vllm_ascend/attention/context_parallel/mla_cp.py
+++ b/vllm_ascend/attention/context_parallel/mla_cp.py
@@ -69,6 +69,9 @@ class AscendMlaCPMetadataBuilder(AscendMLAMetadataBuilder):
                                              self.decode_threshold,
                                              dtype=torch.uint8,
                                              device=device)
        self.block_size = (self.block_size *
                           self.cp_virtual_block_size) // np.gcd(
                               self.block_size, self.cp_virtual_block_size)
    def build(
        self,