[Fix] Ensure metadata sync across DP ranks in eager mode (#2766)

### What this PR does / why we need it? Removes the condition that skips metadata synchronization when `enforce_eager` is enabled. This change is necessary to correctly sync the `with_prefill` and `enable_dbo` flags across all data parallel ranks, which is not required in the base implementation. Forcing the sync operation prevents potential inconsistencies, albeit with a minor performance impact. ### Does this PR introduce _any_ user-facing change? None. ### How was this patch tested? Add a E2E online test case? - vLLM version: v0.10.1.1 - vLLM main: e599e2c65e Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
2025-09-08 09:55:16 +08:00
parent 2693196ef8
commit c735bb0941
1 changed files with 7 additions and 1 deletions
--- a/vllm_ascend/worker/model_runner_v1.py
+++ b/vllm_ascend/worker/model_runner_v1.py
@@ -581,7 +581,13 @@ class NPUModelRunner(LoRAModelRunnerMixin):
    def _sync_metadata_across_dp(
            self, num_tokens: int, with_prefill: bool, enable_dbo: bool
    ) -> tuple[int, Optional[torch.Tensor], bool, bool]:
-        if self.dp_size == 1 or self.vllm_config.model_config.enforce_eager:
+        # TODO: In vLLM, the only thing that needs to be synced is num_tokens, but in
+        # our case, we still need to sync the other two flags as well. So we need to
+        # include them in the all_reduce operation, and more over, we CANNOT skip it
+        # even if we are running in eager mode, which harms performance.
+        # FIXME: Restore the `or self.vllm_config.model_config.enforce_eager` here
+        # immediately once the other two flags are no longer needed.
+        if self.dp_size == 1:
            return num_tokens, None, with_prefill, enable_dbo

        # Sync num_tokens, with_prefill, enable_dbo across dp ranks