[Fix] Add operations in _dummy_run to maintain synchronization with _process_reqs, resolving a service hang (#2454)
### What this PR does / why we need it?
Fixes hang when batch size < DP size.
### Does this PR introduce _any_ user-facing change?
None.
### How was this patch tested?
After this change, the function in DP case will work now.
- vLLM version: v0.10.1.1
- vLLM main:
d9a55204ba
Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
This commit is contained in:
@@ -1911,6 +1911,10 @@ class NPUModelRunner(LoRAModelRunnerMixin):
|
||||
)
|
||||
|
||||
# Padding for DP
|
||||
num_pad, num_tokens_across_dp_native = self.get_dp_padding(num_tokens)
|
||||
# num_tokens += num_pad ## Uncomment this after TorchAir is removed
|
||||
|
||||
# Padding for DP (for TorchAir)
|
||||
(num_tokens, num_tokens_across_dp, with_prefill,
|
||||
_) = self._get_forward_metadata_across_dp_and_pad(
|
||||
num_tokens, with_prefill, False)
|
||||
|
||||
Reference in New Issue
Block a user