Revert "[bugfix]limit graph replay sync (#5761)" (#5965)

### What this PR does / why we need it? reverts #5761 to fix accuracy issues when using piecewise graph mode. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: 2c24bc6996 Signed-off-by: Angazenn <supperccell@163.com>
2026-01-16 23:29:35 +08:00
parent 52086394ae
commit 7feb74590b
1 changed files with 2 additions and 3 deletions
--- a/vllm_ascend/compilation/acl_graph.py
+++ b/vllm_ascend/compilation/acl_graph.py
@@ -186,13 +186,12 @@ class ACLGraphWrapper:
            )

        logger.info_once("Replaying aclgraph")
-        # In async scheduling or multi-threaded (MT) scenarios when graph mode is FULL, it is possible that
+        # In async scheduling or multi-threaded (MT) scenarios, it is possible that
        # the CPU's record event (from update_attn_params) for the iteration i completes
        # before the grph replay of iteration i-1.
        # To ensure proper ordering, we must call synchronize here before replaying,
        # so that update_attn_params only executes after the previous graph replay has fully completed.
-        if self.runtime_mode == CUDAGraphMode.FULL:
-            torch.npu.synchronize()
+        torch.npu.synchronize()
        entry.aclgraph.replay()
        return entry.output