[P/D][PCP] mooncake layerwise support pcp function (#6627)
### What this PR does / why we need it?
mooncake layerwise support pcp function
PCP (Prefill Context Parallelism) Support: Introduced explicit support
for Prefill Context Parallelism (PCP) and Decode Context Parallelism
(DCP) in the Mooncake layerwise KV cache transfer mechanism, allowing
for more granular control and awareness of parallel configurations
during data transfer.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By ci
- vLLM version: v0.15.0
- vLLM main:
d7e17aaacd
---------
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Signed-off-by: liziyu <liziyu16@huawei.com>
Co-authored-by: liziyu <liziyu16@huawei.com>
This commit is contained in:
@@ -743,6 +743,8 @@ class AscendAttentionCPImpl(AscendAttentionBackendImpl):
|
||||
has_prefill = attn_metadata.num_prefills > 0
|
||||
|
||||
if len(kv_cache) > 1:
|
||||
if self.is_kv_producer:
|
||||
attn_metadata.reshape_cache_event = torch.npu.Event()
|
||||
if self.key_cache is None:
|
||||
self.key_cache, self.value_cache = kv_cache[0], kv_cache[1]
|
||||
|
||||
@@ -778,7 +780,8 @@ class AscendAttentionCPImpl(AscendAttentionBackendImpl):
|
||||
value_cache=self.value_cache,
|
||||
slot_indices=slot_mapping,
|
||||
)
|
||||
|
||||
if self.is_kv_producer:
|
||||
attn_metadata.reshape_cache_event.record()
|
||||
return key, value
|
||||
|
||||
def _gather_global_context_output(self, local_context_attn_output):
|
||||
|
||||
@@ -414,9 +414,13 @@ class AscendMlaCPImpl(AscendMLAImpl):
|
||||
kv_c_normed, k_pe = prefill_k_c_normed, prefill_k_pe
|
||||
prefill_k_c_normed = prefill_k_c_normed.squeeze()
|
||||
slot_mapping = attn_metadata.slot_mapping[self.pcp_size * num_decode_tokens :]
|
||||
if self.is_kv_producer:
|
||||
attn_metadata.reshape_cache_event = torch.npu.Event()
|
||||
torch_npu._npu_reshape_and_cache(
|
||||
key=kv_c_normed, value=k_pe, key_cache=kv_cache[0], value_cache=kv_cache[1], slot_indices=slot_mapping
|
||||
)
|
||||
if self.is_kv_producer:
|
||||
attn_metadata.reshape_cache_event.record()
|
||||
prefill_k_nope, prefill_value = (
|
||||
self.kv_b_proj(prefill_k_c_normed)[0]
|
||||
.view(-1, self.num_heads, self.qk_nope_head_dim + self.v_head_dim)
|
||||
|
||||
Reference in New Issue
Block a user