[Fix] Resolve data-parallel (DP) assertion errors in TorchAir (#2626)
### What this PR does / why we need it?
It is confirmed that `num_input_tokens` must be assigned the value of
`maybe_padded_num_tokens` under all circumstances.
### Does this PR introduce _any_ user-facing change?
None.
### How was this patch tested?
Waiting for daily test for TorchAir.
- vLLM version: v0.10.1.1
- vLLM main:
006477e60b
Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
This commit is contained in:
@@ -100,7 +100,7 @@ class NPUTorchairModelRunner(NPUModelRunner):
|
||||
num_tokens_across_dp = torch.full((self.dp_size, ),
|
||||
maybe_padded_num_tokens,
|
||||
dtype=torch.int32,
|
||||
device="cpu")
|
||||
device="npu")
|
||||
else:
|
||||
maybe_padded_num_tokens = num_tokens
|
||||
|
||||
|
||||
@@ -1095,9 +1095,9 @@ class NPUModelRunner(LoRAModelRunnerMixin):
|
||||
enable_dbo) = self._sync_metadata_across_dp(num_input_tokens,
|
||||
with_prefill, enable_dbo)
|
||||
|
||||
if self.use_aclgraph:
|
||||
# When using TorchAir with DP, we have other plans for padding
|
||||
num_input_tokens = maybe_padded_num_tokens
|
||||
# TODO: Now that num_input_tokens is basically identical with maybe_padded_num_tokens
|
||||
# We should consider removing maybe_padded_num_tokens later
|
||||
num_input_tokens = maybe_padded_num_tokens
|
||||
|
||||
# Hot-Swap lora model
|
||||
if self.lora_config:
|
||||
|
||||
Reference in New Issue
Block a user