Revert "[bugfix]limit graph replay sync (#5761)" (#5965)

### What this PR does / why we need it?
reverts #5761 to fix accuracy issues when using piecewise graph mode.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
2c24bc6996

Signed-off-by: Angazenn <supperccell@163.com>
This commit is contained in:
Angazenn
2026-01-16 23:29:35 +08:00
committed by GitHub
parent 52086394ae
commit 7feb74590b

View File

@@ -186,13 +186,12 @@ class ACLGraphWrapper:
)
logger.info_once("Replaying aclgraph")
# In async scheduling or multi-threaded (MT) scenarios when graph mode is FULL, it is possible that
# In async scheduling or multi-threaded (MT) scenarios, it is possible that
# the CPU's record event (from update_attn_params) for the iteration i completes
# before the grph replay of iteration i-1.
# To ensure proper ordering, we must call synchronize here before replaying,
# so that update_attn_params only executes after the previous graph replay has fully completed.
if self.runtime_mode == CUDAGraphMode.FULL:
torch.npu.synchronize()
torch.npu.synchronize()
entry.aclgraph.replay()
return entry.output