-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix accelerate failing tests #30836
Fix accelerate failing tests #30836
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Approving but for the ssm cache I find it a bit weird that we have to do these x.to().
Also in general, accelerate is supposed to handle allllllll the devices no? 👿
@@ -244,6 +244,7 @@ def slow_forward(self, input_states, cache_params: Optional[MambaCache]=None): | |||
# 2. Convolution sequence transformation | |||
if cache_params is not None: | |||
ssm_state = cache_params.ssm_states[self.layer_idx].clone() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the cache param is always on the wrong device we are always gonna have to do the transfer here no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that's right !
@@ -1388,7 +1388,7 @@ def forward( | |||
inputs_embeds, past_key_values_length=past_key_values_length, position_ids=position_ids | |||
) | |||
|
|||
hidden_states = inputs_embeds + positions | |||
hidden_states = inputs_embeds + positions.to(inputs_embeds.device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what was the issue with this? It' s a bit weird here, device placement of the inputs is not automatic? Or is the embedding layer alone place on another device than the embed_position?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inputs_embeds
and postions
can be on different devices (could be fixed if we find a way to make sure that they are on the same device, however, we would probably need to add another attribute in PretrainedConfig
). We can't really move automatically these inputs to the right device since we do the ops in the forward itself. Wrapping them in a nn.Module
to perform the op would work but that's not a good solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright got it!
@@ -962,7 +962,7 @@ class BertModel(BertPreTrainedModel): | |||
`add_cross_attention` set to `True`; an `encoder_hidden_states` is then expected as an input to the forward pass. | |||
""" | |||
|
|||
_no_split_modules = ["BertEmbeddings"] | |||
_no_split_modules = ["BertEmbeddings", "BertLayer"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a general solution that will work for everyone? (I mean why was this failing the test)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that will work for everyone ! I don't know why the layer were not added to _no_split_modules
, this is what we do for all models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long due! Thanks for updating 😄
Just wanna to know if those are failing for quite some time or it's a recent changes (in either |
I'm not sure how long these tests have been failing. However, most of the tests were failing for a good reason (didn't put the right |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks~
What does this PR do ?
This PR fixes some tests (~ 30) are that contained in the accelerate_tests flag. Recently, I ran the following workflow to have an overview of the failing tests : https://github.com/huggingface/transformers/actions/runs/9081904684 or the following CLI:
RUN_SLOW=True pytest -m accelerate_tests tests/
Related PR #30808
Model fixed:
_preload_module_classes
to be used indispatch_model
), not sure if it is worth it if this is the only model for now.Tests skipped:
See the run here with most of the tests passing . There are only 3 tests failing but I skipped them in the latest commit (siglipvisionmodel)
cc @ydshieh