[Bugfix]v0.18.0 support FlashComm1 & DCP for Qwen (#7726)

### What this PR does / why we need it?
This PR backports the changes from #7673 ([Bugfix] support FlashComm1 &
DCP for Qwen) to the releases/v0.18.0 branch.

--------
Signed-off-by: Yang Yuxi <907276627@qq.com>
This commit is contained in:
Yang Yuxi
2026-03-29 15:59:19 +08:00
committed by GitHub
parent 9cc41c9457
commit e776d5c0f1
2 changed files with 3 additions and 2 deletions

View File

@@ -1284,7 +1284,7 @@ class NPUModelRunner(GPUModelRunner):
if (
cudagraph_mode == CUDAGraphMode.FULL
or (enable_sp() and not self.model_config.use_mla)
and self.pcp_size == 1 # TODO(lxs): fix this
and self.pcp_size * self.dcp_size == 1
):
# Currently, Graph Mode and SP will both pad num_tokens,
# Another possible condition is num_tokens_padded != num_tokens_unpadded