feat(attention_cp): support chunked prefill for Qwen3Next with PCP&DCP (#6900)

### What this PR does / why we need it?
Support chunked prefill for Qwen3Next with PCP&DCP

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

---------

Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
This commit is contained in:
Qiu
2026-03-09 17:55:09 +08:00
committed by GitHub
parent a76a509fae
commit 13adcbe44b
6 changed files with 63 additions and 63 deletions

View File

@@ -169,8 +169,6 @@ class TestAscendAttentionCPImpl(TestBase):
attn_metadata.prefill.chunked_context = MagicMock()
local_context_lens_allranks = torch.tensor([[[256, 256], [256, 256]]])
attn_metadata.prefill.chunked_context.local_context_lens_allranks = local_context_lens_allranks
attn_metadata.prefill.chunked_context.batch_chunk_seq_mask = torch.randint(
0, 2, (1024, ), dtype=torch.bool)
attn_metadata.prefill.chunked_context.local_total_toks = local_context_lens_allranks[:,
0,
0].sum(