-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MusicGen] SDPA gives nans/infs during sampling #30020
Comments
Hey @sanchit-gandhi, thanks for opening the issue! |
@ylacombe Is there a known issue with GPU + float16 and SDPA? I was searching and could not find anything, yet I'm having issues with other models (mistral, mixtral) sampling with SDPA. Happy to make a separate issue if it has not been reported. |
hey @cjekel, not that I'm aware of! The current issue is without GPU and with fp32! |
@cjekel There is one possible fix I can think of -
or This problem can scale if the input is too small or too large majorly because of gradient issues especially while working with half precision. |
Should be fixed by #31208! |
System Info
transformers
version: 4.40.0.dev0Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Following #29939, running the following gives an overflow error:
Traceback
Expected behavior
With eager, the code functions as expected:
Could you have a quick look to see if there's a bug in the sdpa implementation @ylacombe? We could also add an integration test that confirms we get sensible outputs with the checkpoint
"facebook/musicgen-small"
.The text was updated successfully, but these errors were encountered: