-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix failing DeepSpeed model zoo tests #30112
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing! cc @amyeroberts for final review :)
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, tests are unrelated and I think failed on main.
Would be nice to add a test, but alright otherwise
Could you please clarify what you mean? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing all of these! 🙏
What does this PR do?
a.
tests/deepspeed/test_model_zoo.py::TestDeepSpeedModelZoo::test_zero_to_fp32_zero3_img_clas_vit
was failing earlier due toValueError: --label_column_name label not found in dataset 'hf-internal-testing/cats_vs_dogs_sample'. Make sure to set
--label_column_nameto the correct text column - one of image, labels.
. Fixed this.b.
tests/deepspeed/test_model_zoo.py::TestDeepSpeedModelZoo::test_zero_to_fp32_zero3_trans_fsmt
was failing earlier due toTypeError: Old language model head is of type <class 'torch.nn.modules.sparse.Embedding'>, which is not an instance of <class 'torch.nn.modules.linear.Linear'>. You should either use a different resize function or make sure that
old_lm_headare an instance of <class 'torch.nn.modules.linear.Linear'>.
. This was because theresize_token_embeddings
didn't account for the case when lm_head is an instance oftorch.nn.Embedding
and it was forcing the new lm_head to be an instance oftorch.nn.Linear
.c.
tests/deepspeed/test_model_zoo.py::TestDeepSpeedModelZoo::test_zero_to_fp32_zero2_trans_m2m_100
was failing earlier due toValueError:
--max_source_lengthis set to 1024, but the model only has 512 position encodings. Consider either reducing
--max_source_lengthto 512 or using a model with larger position embeddings
. Fixed the test to use appropriate source and target sequence lengths.