feat(attention_cp): support chunked prefill for Qwen3Next with PCP&DCP (#6900)
### What this PR does / why we need it?
Support chunked prefill for Qwen3Next with PCP&DCP
- vLLM version: v0.16.0
- vLLM main:
15d76f74e2
---------
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
This commit is contained in:
@@ -169,8 +169,6 @@ class TestAscendAttentionCPImpl(TestBase):
|
||||
attn_metadata.prefill.chunked_context = MagicMock()
|
||||
local_context_lens_allranks = torch.tensor([[[256, 256], [256, 256]]])
|
||||
attn_metadata.prefill.chunked_context.local_context_lens_allranks = local_context_lens_allranks
|
||||
attn_metadata.prefill.chunked_context.batch_chunk_seq_mask = torch.randint(
|
||||
0, 2, (1024, ), dtype=torch.bool)
|
||||
attn_metadata.prefill.chunked_context.local_total_toks = local_context_lens_allranks[:,
|
||||
0,
|
||||
0].sum(
|
||||
|
||||
Reference in New Issue
Block a user