Skip to content

Commit

Permalink
add synchronization after loss
Browse files Browse the repository at this point in the history
  • Loading branch information
Luca-Calabria committed Jan 16, 2025
1 parent f48dda8 commit 1ac26cf
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions optimum/habana/transformers/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -1020,6 +1020,8 @@ def hpu_deepspeed_checkpointing(function, *checkpoint_args, use_reentrant: Optio
# TODO: keep syncs for fast DDP?
with self.accelerator.accumulate(model):
tr_loss_step = self.training_step(model, inputs)
import habana_frameworks.torch.hpu as hthpu
hthpu.synchronize()

is_last_step_and_steps_less_than_grad_acc = (
steps_in_epoch <= args.gradient_accumulation_steps and (step + 1) == steps_in_epoch
Expand Down

0 comments on commit 1ac26cf

Please sign in to comment.