Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix failing DeepSpeed model zoo tests #30112

Merged
merged 5 commits into from
Apr 9, 2024
Merged

Fix failing DeepSpeed model zoo tests #30112

merged 5 commits into from
Apr 9, 2024

Conversation

pacman100
Copy link
Contributor

@pacman100 pacman100 commented Apr 8, 2024

What does this PR do?

  1. Fixing DeepSpeed model zoo tests
    a. tests/deepspeed/test_model_zoo.py::TestDeepSpeedModelZoo::test_zero_to_fp32_zero3_img_clas_vit was failing earlier due to ValueError: --label_column_name label not found in dataset 'hf-internal-testing/cats_vs_dogs_sample'. Make sure to set --label_column_name to the correct text column - one of image, labels.. Fixed this.
    b. tests/deepspeed/test_model_zoo.py::TestDeepSpeedModelZoo::test_zero_to_fp32_zero3_trans_fsmt was failing earlier due to TypeError: Old language model head is of type <class 'torch.nn.modules.sparse.Embedding'>, which is not an instance of <class 'torch.nn.modules.linear.Linear'>. You should either use a different resize function or make sure that old_lm_head are an instance of <class 'torch.nn.modules.linear.Linear'>.. This was because the resize_token_embeddings didn't account for the case when lm_head is an instance of torch.nn.Embedding and it was forcing the new lm_head to be an instance of torch.nn.Linear.
    c. tests/deepspeed/test_model_zoo.py::TestDeepSpeedModelZoo::test_zero_to_fp32_zero2_trans_m2m_100 was failing earlier due to ValueError: --max_source_lengthis set to 1024, but the model only has 512 position encodings. Consider either reducing--max_source_length to 512 or using a model with larger position embeddings. Fixed the test to use appropriate source and target sequence lengths.

@pacman100 pacman100 changed the title Smangrul/fix ds ci Fix failing DeepSpeed model zoo tests Apr 8, 2024
Copy link
Contributor

@muellerzr muellerzr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing! cc @amyeroberts for final review :)

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, tests are unrelated and I think failed on main.
Would be nice to add a test, but alright otherwise

@pacman100
Copy link
Contributor Author

Would be nice to add a test, but alright otherwise

Could you please clarify what you mean?

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing all of these! 🙏

@pacman100 pacman100 merged commit 4e3490f into main Apr 9, 2024
21 checks passed
@pacman100 pacman100 deleted the smangrul/fix-ds-ci branch April 9, 2024 06:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants