You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PR enables handling loss keyword arguments in the Mistral
forward() method. Specifically, if `num_items_in_batch` is passed,
the value is used to properly normalize the loss value.
This relates to the Gradient Accumulation fix (huggingface#34191)
Fixeshuggingface#34575
and careful here, not to break multi gpu... With Llama 3.1 I get usually an error when dividing loss/num_items_in_batch, they tend to go to different gpus.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
When calling the forward method on the NeMo Mistral model, the following exception occurs:
Expected behavior
The forward() method should use
num_items_in_batch
for the loss calculation.The text was updated successfully, but these errors were encountered: