[Fix] Resolve data-parallel (DP) assertion errors in TorchAir (#2626)

### What this PR does / why we need it? It is confirmed that `num_input_tokens` must be assigned the value of `maybe_padded_num_tokens` under all circumstances. ### Does this PR introduce _any_ user-facing change? None. ### How was this patch tested? Waiting for daily test for TorchAir. - vLLM version: v0.10.1.1 - vLLM main: 006477e60b Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
2025-08-29 16:06:49 +08:00
parent 600b08f754
commit aadc75c247
2 changed files with 4 additions and 4 deletions
--- a/vllm_ascend/torchair/torchair_model_runner.py
+++ b/vllm_ascend/torchair/torchair_model_runner.py
@@ -100,7 +100,7 @@ class NPUTorchairModelRunner(NPUModelRunner):
            num_tokens_across_dp = torch.full((self.dp_size, ),
                                              maybe_padded_num_tokens,
                                              dtype=torch.int32,
-                                              device="cpu")
+                                              device="npu")
        else:
            maybe_padded_num_tokens = num_tokens

--- a/vllm_ascend/worker/model_runner_v1.py
+++ b/vllm_ascend/worker/model_runner_v1.py
@@ -1095,9 +1095,9 @@ class NPUModelRunner(LoRAModelRunnerMixin):
         enable_dbo) = self._sync_metadata_across_dp(num_input_tokens,
                                                     with_prefill, enable_dbo)

-        if self.use_aclgraph:
-            # When using TorchAir with DP, we have other plans for padding
-            num_input_tokens = maybe_padded_num_tokens
+        # TODO: Now that num_input_tokens is basically identical with maybe_padded_num_tokens
+        # We should consider removing maybe_padded_num_tokens later
+        num_input_tokens = maybe_padded_num_tokens

        # Hot-Swap lora model
        if self.lora_config: