-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update training_args.py - addition of self.distributed_state when using XPU #25999
Conversation
Missing distributed state so lign 1813-1814 failed because value is undefined
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! We should also change the os.environ
to be in front so PartialState
picks it up
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LG2M here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding!
Before merging - could you:
- Update the PR title to something more descriptive
- Give a bit more context about the error?
The addition makes sense and is in line with the other logic. However, self.distributed_state
is defined on L1782 and the referenced line in the PR description L1813-1814 doesn't seem to be related to this issue. This is mainly for documentation if people come back to this PR.
…ng XPU (huggingface#25999) * Update training_args.py Missing distributed state so lign 1813-1814 failed because value is undefined * Update training_args.py Co-authored-by: Zach Mueller <muellerzr@gmail.com> --------- Co-authored-by: Zach Mueller <muellerzr@gmail.com>
addition of
self.distributed_state
when using XPUFixe
In the base code self.distributed_state does not appear to be defined, which causes the script to crash when used on lines 1813 and 1814 in my case.
I therefore propose an update with a definition of this variable when using an XPU
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.