[v0.18.0][BugFix] PIECEWISE mode also requires synchronization (#8469)

### What this PR does / why we need it?

This PR enables synchronization for the `PIECEWISE` runtime mode in ACL
graph replay. Previously, synchronization was only performed in `FULL`
mode. However, `PIECEWISE` mode also requires this barrier to ensure
that parameter updates are completed before the graph is replayed,
preventing accuracy loss.

The logic is also corrected to skip synchronization specifically for
EAGLE draft models, as intended.

Fixes #

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

CI passed.

---------

Signed-off-by: 1zzk <785396250@qq.com>
This commit is contained in:
1kzk
2026-04-21 16:22:32 +08:00
committed by GitHub
parent b717dc17a3
commit 7850264324

View File

@@ -202,10 +202,9 @@ class ACLGraphWrapper:
# If we do not in main model and in full-graph mode when using merge-eagle-graph, # If we do not in main model and in full-graph mode when using merge-eagle-graph,
# we do not need to synchronize. # we do not need to synchronize.
# When enable_enpu is on, model_runner orders update vs replay; skip here. # When enable_enpu is on, model_runner orders update vs replay; skip here.
# When FULL + EAGLE draft (merge path), replay does not need this barrier. # When EAGLE draft (merge path), replay does not need this barrier.
is_draft_eagle = _EXTRA_CTX.is_draft_model and self.use_eagle is_draft_eagle = _EXTRA_CTX.is_draft_model and self.use_eagle
need_sync = self.runtime_mode == CUDAGraphMode.FULL and not is_draft_eagle if not self.enable_enpu and not is_draft_eagle:
if not self.enable_enpu and need_sync:
torch.npu.current_stream().synchronize() torch.npu.current_stream().synchronize()
entry.aclgraph.replay() entry.aclgraph.replay()
return entry.output return entry.output