-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mt5 getting nans with fp16 #10819
Comments
Duplicate of #10830 |
Hi @patrickvonplaten this is not exact duplicate, I am using mt5-small and the other user in #10830 is using t5-large, I appreciate considering both thank you |
@dorost1234, please kindly test if this PR fixes the problem: #10956 |
@stas00 thank you very much for the contributions, it now works for me for the mt5-small, I am running some more experiments with it and update. |
Dear @stas00 I also used your debug codes:
|
I was just thinking about it, so thank you for confirming that. Deepspeed is not using Though let's continue the discussion on the deepspeed in the other issue you opened, since these are related but different problems. That's we may fix one but not the other, or the fixes may come at different times, so it's easier to track separate issues. Or if there is not one specific issue to t5/mt5+deepspeed please open one. Thank you. |
Dear @stas00 |
I already did - please see the link in my last comment. Please do not worry, we will surely find one way or another to resolve this. |
oh, great, thank you very much |
Dear @stas00 |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Environment info
transformers
version: 4.4.2Who can help
t5: @patrickvonplaten, @patil-suraj
Information
I am using mt5-small model:
The tasks I am working on is:
To reproduce
Steps to reproduce the behavior:
python run_translation.py --model_name_or_path google/mt5-small --do_train --do_eval --source_lang en --target_lang ro --dataset_name wmt16 --dataset_config_name ro-en --output_dir test/tst-translation --per_device_train_batch_size=4 --per_device_eval_batch_size=4 --overwrite_output_dir --predict_with_generate --max_train_samples 100 --fp16
outputs:
Expected behavior
being able to use fp16 with mt5 models. Thank you very much for your help, this is really crucial for me to be able to run these models with fp16 to be able to fit more data into old GPUs I have access to and I appreciate a lot your help.
The text was updated successfully, but these errors were encountered: