-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
F.scaled_dot_product_attention support #26572
Merged
fxmarty
merged 114 commits into
huggingface:main
from
fxmarty:torch-sdpa-preliminary-support
Dec 8, 2023
Merged
Changes from 1 commit
Commits
Show all changes
114 commits
Select commit
Hold shift + click to select a range
74be54c
add sdpa
fxmarty 9d14f0d
wip
fxmarty f803de3
cleaning
fxmarty c0bcbfa
add ref
fxmarty 38332d7
yet more cleaning
fxmarty 3b47502
and more :)
fxmarty dd646c1
Merge branch 'main' into torch-sdpa-preliminary-support
fxmarty 79c12a9
wip llama
fxmarty dc929cd
working llama
fxmarty 17954dd
add output_attentions=True support
fxmarty f48f4fa
bigcode sdpa support
fxmarty dfc47a5
fixes
fxmarty eba83c1
gpt-bigcode support, require torch>=2.1.1
fxmarty 5693535
add falcon support
fxmarty 3758375
Merge branch 'main' into torch-sdpa-preliminary-support
fxmarty ca87380
fix conflicts falcon
fxmarty 969dda9
style
fxmarty 06766ec
fix attention_mask definition
fxmarty 5c648d4
remove output_attentions from attnmaskconverter
fxmarty 674bff4
support whisper without removing any Copied from statement
fxmarty dd89c3c
fix mbart default to eager renaming
fxmarty f31c7b3
fix typo in falcon
fxmarty 280c078
fix is_causal in SDPA
fxmarty e41ecfa
check is_flash_attn_2_available in the models init as well in case th…
fxmarty 951bce0
Merge branch 'main' into torch-sdpa-preliminary-support
fxmarty 6f7964d
add warnings when falling back on the manual implementation
fxmarty 0e38a95
precise doc
fxmarty 1bd07aa
wip replace _flash_attn_enabled by config.attn_implementation
fxmarty feae821
fix typo
fxmarty 2032e64
add tests
fxmarty d98c2f9
style
fxmarty ab59f9d
add a copy.deepcopy on the config in from_pretrained, as we do not wa…
fxmarty 98a3825
obey to config.attn_implementation if a config is passed in from_pret…
fxmarty 098a62e
fix is_torch_sdpa_available when torch is not installed
fxmarty b960912
remove dead code
fxmarty 9df4c8f
Merge branch 'main' into torch-sdpa-preliminary-support
fxmarty f1df402
Update src/transformers/modeling_attn_mask_utils.py
fxmarty f49c2a3
Update src/transformers/modeling_attn_mask_utils.py
fxmarty 3a22d8d
Update src/transformers/modeling_attn_mask_utils.py
fxmarty f0fa993
Update src/transformers/modeling_attn_mask_utils.py
fxmarty f084040
Update src/transformers/modeling_attn_mask_utils.py
fxmarty 885bbe4
Update src/transformers/models/bart/modeling_bart.py
fxmarty 4dd5523
remove duplicate pretraining_tp code
fxmarty 349c99b
add dropout in llama
fxmarty 5e56014
precise comment on attn_mask
fxmarty 951f70e
add fmt: off for _unmask_unattended docstring
fxmarty c4e207e
precise num_masks comment
fxmarty e752d93
nuke pretraining_tp in LlamaSDPAAttention following Arthur's suggestion
fxmarty a072c5d
cleanup modeling_utils
fxmarty f700973
backward compatibility
fxmarty e267764
fix style as requested
fxmarty d044d81
style
fxmarty a9e7606
improve documentation
fxmarty 1727210
test pass
fxmarty ae86680
style
fxmarty 5706ecb
add _unmask_unattended tests
fxmarty d2326e2
skip meaningless tests for idefics
fxmarty c0f849e
hard_check SDPA requirements when specifically requested
fxmarty 0fa8de0
standardize the use if XXX_ATTENTION_CLASSES
fxmarty 637e473
fix SDPA bug with mem-efficient backend on CUDA when using fp32
fxmarty 55ec325
fix test
fxmarty 33ef389
rely on SDPA is_causal parameter to handle the causal mask in some cases
fxmarty 2e6bc3e
fix FALCON_ATTENTION_CLASSES
fxmarty 688d86e
Merge branch 'main' into torch-sdpa-preliminary-support
fxmarty 5913dee
remove _flash_attn_2_enabled occurences
fxmarty 11ab3ae
fix test
fxmarty b74894d
add OPT to the list of supported flash models
fxmarty 4ff1057
improve test
fxmarty 8bd6c81
properly test on different SDPA backends, on different dtypes & prope…
fxmarty a11c114
remove remaining _flash_attn_2_enabled occurence
fxmarty b5593a1
Update src/transformers/modeling_utils.py
fxmarty 1bc983a
Update src/transformers/modeling_utils.py
fxmarty 316b448
Update src/transformers/modeling_utils.py
fxmarty 52178ba
Update src/transformers/modeling_attn_mask_utils.py
fxmarty 231e354
Update docs/source/en/perf_infer_gpu_one.md
fxmarty f907b3f
remove use_attn_implementation
fxmarty 0e9e9f2
fix docstring & slight bug
fxmarty c47c24e
Merge branch 'main' into torch-sdpa-preliminary-support
fxmarty 5c77b94
make attn_implementation internal (_attn_implementation)
fxmarty cd9e209
typos
fxmarty e475f25
fix tests
fxmarty 48a6bfc
deprecate use_flash_attention_2=True
fxmarty 8e7f8b5
fix test
fxmarty 7a85efc
add back llama that was removed by mistake
fxmarty 3649553
fix tests
fxmarty f09a65c
Merge branch 'main' into torch-sdpa-preliminary-support
fxmarty c1b87b8
remove _flash_attn_2_enabled occurences bis
fxmarty 8950b60
add check & test that passed attn_implementation is valid
fxmarty 18c2678
fix falcon torchscript export
fxmarty d96e0d2
fix device of mask in tests
fxmarty bb20113
Merge branch 'fix-device-mask-tests' into torch-sdpa-preliminary-support
fxmarty 9e133c9
add tip about torch.jit.trace and move bt doc below sdpa
fxmarty 76a1e17
fix parameterized.expand order
fxmarty 65aeba6
move tests from test_modeling_attn_mask_utils to test_modeling_utils …
fxmarty 09ab820
Merge branch 'main' into torch-sdpa-preliminary-support
fxmarty 48d95ea
Merge branch 'main' into torch-sdpa-preliminary-support
fxmarty 546dd51
update sdpaattention class with the new cache
fxmarty 2045915
Update src/transformers/configuration_utils.py
fxmarty eb11883
Update src/transformers/models/bark/modeling_bark.py
fxmarty 920686e
address review comments
fxmarty 2146857
WIP torch.jit.trace fix. left: test both eager & sdpa
fxmarty 9b48591
add test for torch.jit.trace for both eager/sdpa
fxmarty 4315638
Merge branch 'main' into torch-sdpa-preliminary-support
fxmarty cc7fc4e
fix falcon with torch==2.0 that needs to use sdpa
fxmarty 84d9605
Merge branch 'torch-sdpa-preliminary-support' of https://github.com/f…
fxmarty 8486770
fix doc
fxmarty c6181f2
hopefully last fix
fxmarty 7ebfd1d
fix key_value_length that has no default now in mask converter
fxmarty dacf149
is it flacky?
fxmarty f196bef
Merge branch 'main' into torch-sdpa-preliminary-support
fxmarty 810de1a
fix speculative decoding bug
fxmarty f116cce
tests do pass
fxmarty 4721c36
Merge branch 'main' into torch-sdpa-preliminary-support
fxmarty 3f06a3a
fix following #27907
fxmarty File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
cleanup modeling_utils
- Loading branch information
commit a072c5d7330df1828628cef1473256234e68e88f
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not default to eager here instead of the setter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit tricky: if
attn_implementation
is explicitly specified, we want to skip the automatic dispatch between SDPA / eager and just use the user-provided config (https://github.com/fxmarty/transformers/blob/torch-sdpa-preliminary-support/src/transformers/modeling_utils.py#L1259C11-L1268).The case
config._attn_implementation is None
corresponds to the case where the user has not specified anything.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should allow the user to specify which attention mechanism to use in the config:
=> This should not be allowed. The model's config should only define the architecture, not setting that can be changed at runtime.
The user should always have to call
model.set_attn_type(...)
IMOThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to go with that @patrickvonplaten. WDYT @ArthurZucker @LysandreJik
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So either:
sdpa
if possible else eager (-> specify after loading the acceleration you want).In both case, passing this as an argument to
from_pretrained
is actually simpler if you want to use FA2 (not inplace modifications).I'm guessing you have more experience with that in diffusers @patrickvonplaten so will follow you here. Agree with you that restricting is better, only one setter and only one way to change this!
Let's go for 1, lots of pros and cons but agree with you on this