[Fix] Add operations in _dummy_run to maintain synchronization with _process_reqs, resolving a service hang (#2454)

### What this PR does / why we need it?
Fixes hang when batch size < DP size.

### Does this PR introduce _any_ user-facing change?
None.

### How was this patch tested?
After this change, the function in DP case will work now.

- vLLM version: v0.10.1.1
- vLLM main:
d9a55204ba

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
This commit is contained in:
yiz-liu
2025-08-25 19:56:02 +08:00
committed by GitHub
parent de7649492d
commit 99bf25af76

View File

@@ -1911,6 +1911,10 @@ class NPUModelRunner(LoRAModelRunnerMixin):
)
# Padding for DP
num_pad, num_tokens_across_dp_native = self.get_dp_padding(num_tokens)
# num_tokens += num_pad ## Uncomment this after TorchAir is removed
# Padding for DP (for TorchAir)
(num_tokens, num_tokens_across_dp, with_prefill,
_) = self._get_forward_metadata_across_dp_and_pad(
num_tokens, with_prefill, False)