[Feat] support basic pcp&dcp for qwen3next (#6091)
### What this PR does / why we need it?
This PR implements Context Parallelism (CP) support for the Qwen3-Next
model, including PCP (Parallel Context Parallelism) and DCP
(Dynamic/Data Context Parallelism).
- vLLM version: v0.15.0
- vLLM main:
f176443446
---------
Signed-off-by: SunnyLee219 <3294305115@qq.com>
Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com>
Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>
Signed-off-by: Bai Yongbin <845473182@qq.com>
Co-authored-by: SunnyLee219 <3294305115@qq.com>
Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com>
Co-authored-by: 白永斌 <baiyongbin3@h-partners.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
This commit is contained in:
@@ -43,6 +43,7 @@ def set_ascend_forward_context(
|
||||
model_instance: torch.nn.Module = None,
|
||||
is_draft_model=False,
|
||||
skip_compiled: bool = False,
|
||||
max_tokens_across_pcp: int = 0,
|
||||
draft_attn_metadatas=None,
|
||||
):
|
||||
"""A context manager that stores the current forward context,
|
||||
@@ -139,6 +140,7 @@ def set_ascend_forward_context(
|
||||
max_tokens_across_dp = num_tokens
|
||||
|
||||
forward_context.max_tokens_across_dp = max_tokens_across_dp
|
||||
forward_context.max_tokens_across_pcp = max_tokens_across_pcp
|
||||
|
||||
if num_tokens is not None:
|
||||
if num_actual_tokens is None:
|
||||
|
||||
Reference in New Issue
Block a user