feat(attention_cp): support chunked prefill for Qwen3Next with PCP&DCP (#6900)

### What this PR does / why we need it?
Support chunked prefill for Qwen3Next with PCP&DCP

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

---------

Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
This commit is contained in:
Qiu
2026-03-09 17:55:09 +08:00
committed by GitHub
parent a76a509fae
commit 13adcbe44b
6 changed files with 63 additions and 63 deletions

View File

@@ -80,6 +80,7 @@ def test_models_pcp_dcp_basic():
decode_context_parallel_size=1,
max_num_batched_tokens=1024,
enable_expert_parallel=True,
long_prefill_token_threshold=4,
gpu_memory_utilization=0.8,
block_size=128) as runner:
runner.model.generate(prompts, sampling_params)