Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: avoid human error, automatically infer generative models #33212

Merged
merged 40 commits into from
Feb 13, 2025

Conversation

gante
Copy link
Member

@gante gante commented Aug 30, 2024

What does this PR do?

This PR:

  1. Replaces manual definition of generative models to test with generate with an automatic definition -- if model_class.can_generate(), then it runs tests from GenerationTesterMixin. No more human mistakes, which happened frequently in the past months 🐛 🔫
  2. Now that we run generation tests for all models that can generate, there are a few (old) bad apples. Explicitly skip them i.e. overwrite the automated all_generative_model_classes and explain why certain classes are being skipped. (bad apples detected with py.test tests/models/ -k test_greedy_generate -vv) 💔

In a follow-up PR:

  1. We need to manually define the model's main input name (e.g., pixel_values) in the model tester. Make it use model.main_input_name instead, to avoid human error Done ✅
  2. Despite the changes in this PR, generate tests will only run if GenerationTesterMixin is inherited. We can easily forget to add the mixin, resulting in a false positive. Add an automated check: if any of the model classes can generate, then GenerationTesterMixin must be inherited in the tester

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@gante gante changed the title CI: automatically infer generative models CI: avoid human error, automatically infer generative models Aug 30, 2024
@zucchini-nlp
Copy link
Member

Super cool! If VLMs start failing on generation tests, that's okay. I have a local draft for that, which needs to be reviewed after a few other PRs are merged

@gante gante force-pushed the run_all_generate_tests_all_times branch from 489a2d6 to 1e8e794 Compare September 20, 2024 16:45
@gante
Copy link
Member Author

gante commented Feb 11, 2025

Many side-fixes later, ready for review 🙌

@gante gante marked this pull request as ready for review February 11, 2025 14:24
@@ -67,7 +67,6 @@
class TFCoreModelTesterMixin:
model_tester = None
all_model_classes = ()
all_generative_model_classes = ()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we don't need def all_generative_model_classes for TF ..?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incorrectly moved

(I tried to apply the same pattern on TF, but it has too many broken models. Then I reverted the changes. This one was incorrectly reverted, I'm going to chase down the TF diff to 0)

Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left 3 nit question but yeah thank you!

I see a few files have style changes: I always think they should be fixed in a separate PR, but you can keep them here if you wish.

@gante
Copy link
Member Author

gante commented Feb 13, 2025

@ydshieh all comments addressed 🤗

Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again.

Just to make sure the change in tests/test_modeling_tf_common.py is what you expect:

  • removing all_generative_model_classes = () OK
  • not adding def all_generative_model_classes(self): : if this is intended, OK, for you to check

@ydshieh
Copy link
Collaborator

ydshieh commented Feb 13, 2025

Thank you again.

Just to make sure the change in tests/test_modeling_tf_common.py is what you expect:

  • removing all_generative_model_classes = () OK
  • not adding def all_generative_model_classes(self): : if this is intended, OK, for you to check

ah , I see you put it back :-), all good then

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this hopefully helps us prevent cases when generation tests are not added.

I am not sure I got why audio models generation is turned off, if it was enabled before and CI was green? Same question for Blip2

Comment on lines +997 to +998
# Doesn't run generation tests. TODO: fix generation tests for Blip2ForConditionalGeneration
all_generative_model_classes = ()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was working on main for Blip2ForConditionalGeneration, are there many failures?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove this line (= if we don't skip the tests), py.test tests/models/blip_2/test_modeling_blip_2.py rebased on main results in 21 failures :P

I've also double-checked the other models with skips on Monday. Most of them have unique model properties that do not work well with generate

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, note that there are two testers (Blip2ForConditionalGenerationDecoderOnlyTest and Blip2ModelTest), the skips are only on the latter. I don't know why the latter needs to skip, but that's beyond the scope of this PR :P

They were also being skipped before.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on main, Blip2ModelTest doesn't have all_generative_model_classes. which means it doesn't run generate tests, and this PR doesn't skip any extra tests for this test class

(I don't know why however)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks! I believe something similar for audio model since for me the skip comments weren't very clear

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's why I ask frequently more (detailed) comments in many PRs I reviewed 😆

@ydshieh
Copy link
Collaborator

ydshieh commented Feb 13, 2025

@gante you can ping me to merge if the CI persists to be red but irrelevant to your PR :-)

@gante
Copy link
Member Author

gante commented Feb 13, 2025

@ydshieh yes, please merge 🙏

(was trying to be autonomous :D)

@ydshieh ydshieh merged commit 62c7ea0 into huggingface:main Feb 13, 2025
23 of 25 checks passed
@gante gante deleted the run_all_generate_tests_all_times branch February 13, 2025 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants