[v0.18.0][BugFix] PIECEWISE mode also requires synchronization (#8469)
### What this PR does / why we need it? This PR enables synchronization for the `PIECEWISE` runtime mode in ACL graph replay. Previously, synchronization was only performed in `FULL` mode. However, `PIECEWISE` mode also requires this barrier to ensure that parameter updates are completed before the graph is replayed, preventing accuracy loss. The logic is also corrected to skip synchronization specifically for EAGLE draft models, as intended. Fixes # ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed. --------- Signed-off-by: 1zzk <785396250@qq.com>
This commit is contained in:
@@ -202,10 +202,9 @@ class ACLGraphWrapper:
|
||||
# If we do not in main model and in full-graph mode when using merge-eagle-graph,
|
||||
# we do not need to synchronize.
|
||||
# When enable_enpu is on, model_runner orders update vs replay; skip here.
|
||||
# When FULL + EAGLE draft (merge path), replay does not need this barrier.
|
||||
# When EAGLE draft (merge path), replay does not need this barrier.
|
||||
is_draft_eagle = _EXTRA_CTX.is_draft_model and self.use_eagle
|
||||
need_sync = self.runtime_mode == CUDAGraphMode.FULL and not is_draft_eagle
|
||||
if not self.enable_enpu and need_sync:
|
||||
if not self.enable_enpu and not is_draft_eagle:
|
||||
torch.npu.current_stream().synchronize()
|
||||
entry.aclgraph.replay()
|
||||
return entry.output
|
||||
|
||||
Reference in New Issue
Block a user