Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrap _prepare_4d_causal_attention_mask as a leaf function #27236

Merged

Conversation

michaelbenayoun
Copy link
Member

What does this PR do?

This wraps the _prepare_4d_causal_attention_mask as a FX leaf function for similar reasons than here.

The only consequence it has is that it will not be possible to edit this function by using torch.fx. It is not a big deal at all, but I will remove this constraint as soon as possible.

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change LGTM - thanks for adding!

Happy to merge with @younesbelkada's approval.

cc @patrickvonplaten for reference

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good! My only question is that this seems to wrap _prepare_4d_causal_attention_mask all the time as long as fx is available, is it possible to wrap it only if users perform model tracing?

@michaelbenayoun
Copy link
Member Author

No it needs to happen at the top-module level.

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, reading a bit the docs of the method it just registers the method as a "leaf function" without modifying anything - seems safe to me

https://pytorch.org/docs/stable/_modules/torch/fx/_symbolic_trace.html#wrap

@michaelbenayoun
Copy link
Member Author

It is, the only difference it implies is that it will not be possible to edit this function via torch.fx.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Nov 2, 2023

The documentation is not available anymore as the PR was closed or merged.

@amyeroberts amyeroberts merged commit 4557a0d into huggingface:main Nov 2, 2023
@michaelbenayoun michaelbenayoun deleted the fx_for_for_pp_with_llama branch November 2, 2023 13:29
@michaelbenayoun
Copy link
Member Author

Actually I have a workaround that works for my purposes in optimum-neuron, I can reverse the changes or keep them like that, as you wish. It's not a big deal in any case.

@amyeroberts
Copy link
Collaborator

I think it's fine to leave as-is - people might want to use torch tracing outside of optimum-neuron.

EduardoPach pushed a commit to EduardoPach/transformers that referenced this pull request Nov 19, 2023
…ace#27236)

Wrap _prepare_4d_causal_attention_mask as a leaf function
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants