[Fix] Add operations in _dummy_run to maintain synchronization with _process_reqs, resolving a service hang (#2454)

### What this PR does / why we need it? Fixes hang when batch size < DP size. ### Does this PR introduce _any_ user-facing change? None. ### How was this patch tested? After this change, the function in DP case will work now. - vLLM version: v0.10.1.1 - vLLM main: d9a55204ba Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
2025-08-25 19:56:02 +08:00
parent de7649492d
commit 99bf25af76
1 changed files with 4 additions and 0 deletions
--- a/vllm_ascend/worker/model_runner_v1.py
+++ b/vllm_ascend/worker/model_runner_v1.py
@@ -1911,6 +1911,10 @@ class NPUModelRunner(LoRAModelRunnerMixin):
            )

        # Padding for DP
+        num_pad, num_tokens_across_dp_native = self.get_dp_padding(num_tokens)
+        # num_tokens += num_pad  ## Uncomment this after TorchAir is removed
+
+        # Padding for DP (for TorchAir)
        (num_tokens, num_tokens_across_dp, with_prefill,
         _) = self._get_forward_metadata_across_dp_and_pad(
             num_tokens, with_prefill, False)