-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix mask slicing for models with HybridCache #35681
Merged
Merged
Changes from 1 commit
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
e1247d1
correctly slice
Cyrilvallez 9c59b35
check mask
Cyrilvallez 9f47f44
Update modular_gemma2.py
Cyrilvallez ed2e3d7
fix
Cyrilvallez 9414572
add tests
Cyrilvallez 7c31343
fix typo
Cyrilvallez 8004a70
finally fix mask slicing
Cyrilvallez de59ddc
Finally correctly slice in all cases!!
Cyrilvallez 5667025
add test for all attention functions
Cyrilvallez 1db64c6
small fix in tests
Cyrilvallez f1d4868
trick around dynamo tracing issue
Cyrilvallez ba84059
last update
Cyrilvallez 0f0958f
more robust
Cyrilvallez 370326a
kwargs propagation
Cyrilvallez 213da5a
make it explicit for checkpointing
Cyrilvallez ee426a8
apply modular
Cyrilvallez File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is super important ! Why is it removed?
I know it is counter intuitive, but
_flash_attention_forward
takes the attention mask to pad / unpad the input itds.Thus you need the slicing otherwise this operation fails, see the blame !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I was too fast on this one, the HybridCache behaves slightly differently than I remembered. There was still an issue in the slicing during prefill for FA2 though!