Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix use_seedable_sampler when initializing Accelerator #31449

Closed
wants to merge 1 commit into from

Conversation

SunMarc
Copy link
Member

@SunMarc SunMarc commented Jun 17, 2024

What does this PR do ?

This PR make sure that we don't pass use_seedable_sampler in Accelerator if we don't have the required version. Users are getting initialization error since AcceleratorConfig class have the use_seedable_sampler arg that was introduced in v0.26 in accelerate. However, the required version of accelerate in transformers is v0.21

However, I'm facing with an issue here. The default value in accelerate is use_seedable_sampler is False but in transformers, it is True. So, for users who have version of accelerate < 0.26, should I log a warning if use_seedable_sampler is True ? The warning is necessary since the behavior differs but I don't want to add too many warnings. cc @muellerzr @amyeroberts

Another solution would be to just change the min required version of accelerate to v0.26 when installing transformers.

Fixes #31433

@SunMarc SunMarc requested review from muellerzr and amyeroberts June 17, 2024 13:59
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ydshieh
Copy link
Collaborator

ydshieh commented Jun 17, 2024

The warning is necessary since the behavior differs but I don't want to add too many warnings.

Regarding the above, there is logger.warning_once defined in src/transformers/utils/logging.py

@SunMarc
Copy link
Member Author

SunMarc commented Jun 17, 2024

Regarding the above, there is logger.warning_once defined in src/transformers/utils/logging.py

Thx for sharing ! I'm also worried about showing warnings that could be prevented ! In this case, the default behavior of AcceleratorConfig will show a warning, which is not great.

@amyeroberts
Copy link
Collaborator

So, for users who have version of accelerate < 0.26, should I log a warning if use_seedable_sampler is True ?

My understanding was if the version of accelerate is <0.26, it doesn't really matter what the value of use_seedable_sampler is, if it's set in the config it can't be used.

It is possible we have accelerator configs saved out, from this version of transformers which are compatible with some use cases i.e. they contain use_seedable_sampler but wouldn't throw an error?

@SunMarc
Copy link
Member Author

SunMarc commented Jun 17, 2024

My understanding was if the version of accelerate is <0.26, it doesn't really matter what the value of use_seedable_sampler is, if it's set in the config it can't be used.

Before accelerate 0.26, the training results are not fully reproducible using a different sampling technique. (use_seedable_sampler=False behavior). Then, we introduced use_seedable_sampler in v0.26 to enable fully reproducible training results.
So, for an user who uses the default AcceleratorConfig with accelerate < 0.26, he will think that use_seedable_sampler=True is works as expected, which is not the case with this version of accelerate. This is why I was suggesting to raise a warning. cc @muellerzr if I understood correctly !

It is possible we have accelerator configs saved out, from this version of transformers which are compatible with some use cases i.e. they contain use_seedable_sampler but wouldn't throw an error?

Yes, it will not throw an error if accelerate > v0.28 (DataLoaderConfiguration introduced) or v0.26 (use_seedable_sampler introduced)

@amyeroberts
Copy link
Collaborator

Yes, it will not throw an error if accelerate > v0.28 (DataLoaderConfiguration introduced) or v0.26 (use_seedable_sampler introduced)

Ah, sorry, I should have been clearer. If accelerate version is < v0.26 and we're on the current version of transformers, are there settings which trainer and accelerate run without error i.e. I would see in the config that use_seedable_sample=True but it's not used?

@muellerzr
Copy link
Contributor

Ah, sorry, I should have been clearer. If accelerate version is < v0.26 and we're on the current version of transformers, are there settings which trainer and accelerate run without error i.e. I would see in the config that use_seedable_sample=True but it's not used?

Basically.

transformers requires 0.21.0, so we have two routes to do here:

  1. Bump transformers to accelerate==0.26.0, which I'm in favor of since transformers has always relied on use_seedable_sampler=True
  2. Bump transformers to accelerate==0.24.0, which introduced the idea of a seedable sampler, that was default to True until 0.26.0

I think it'd be safer to do a small bump to 0.24.0, and then in December we can fully bump to 0.26.0 (or around then). As the important part is ensuring we have the seedable sampler enabled, which our logic currently has.

This PR will successfully do what we need, the secondary part here is moreso about how to address our MVP version

@amyeroberts
Copy link
Collaborator

@muellerzr OK, happy for us to bump the accelerate version. Are there any breaking changes between 0.21 -> 0.24 or 0.26 which we should flag on the release notes? cc @ydshieh

@muellerzr
Copy link
Contributor

Nope, there are none!

@ydshieh
Copy link
Collaborator

ydshieh commented Jun 18, 2024

For bumping versions, it would be a good chance to follow

https://www.notion.so/huggingface2/Circle-Ci-Docker-images-3bedc8bc602e4ca88c34c6c194accd74

and make a hands on it :-)

ping me or @ArthurZucker if there is anything unclear regarding this process or any issue encountered.

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - change LGTM. Would be good to confirm if we can bump the minimum accelerate version and just remove the dependancy here altogether

@SunMarc
Copy link
Member Author

SunMarc commented Aug 12, 2024

Yes, I'm working on that right now. I will open a PR soon !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Minimum required accelerate library is not compatible
5 participants