CI: avoid human error, automatically infer generative models #33212

gante · 2024-08-30T09:56:16Z

What does this PR do?

This PR:

Replaces manual definition of generative models to test with generate with an automatic definition -- if model_class.can_generate(), then it runs tests from GenerationTesterMixin. No more human mistakes, which happened frequently in the past months 🐛 🔫
Now that we run generation tests for all models that can generate, there are a few (old) bad apples. Explicitly skip them i.e. overwrite the automated all_generative_model_classes and explain why certain classes are being skipped. (bad apples detected with py.test tests/models/ -k test_greedy_generate -vv) 💔

In a follow-up PR:

~~We need to manually define the model's main input name (e.g., pixel_values) in the model tester. Make it use model.main_input_name instead, to avoid human error~~ Done ✅
Despite the changes in this PR, generate tests will only run if GenerationTesterMixin is inherited. We can easily forget to add the mixin, resulting in a false positive. Add an automated check: if any of the model classes can generate, then GenerationTesterMixin must be inherited in the tester

HuggingFaceDocBuilderDev · 2024-08-30T10:20:22Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp · 2024-08-30T18:11:23Z

Super cool! If VLMs start failing on generation tests, that's okay. I have a local draft for that, which needs to be reviewed after a few other PRs are merged

gante · 2025-02-11T14:24:40Z

Many side-fixes later, ready for review 🙌

tests/models/musicgen/test_modeling_musicgen.py

tests/models/pop2piano/test_modeling_pop2piano.py

ydshieh · 2025-02-12T13:06:12Z

tests/utils/test_modeling_tf_core.py

@@ -67,7 +67,6 @@
 class TFCoreModelTesterMixin:
    model_tester = None
    all_model_classes = ()
-    all_generative_model_classes = ()


why we don't need def all_generative_model_classes for TF ..?

incorrectly moved

(I tried to apply the same pattern on TF, but it has too many broken models. Then I reverted the changes. This one was incorrectly reverted, I'm going to chase down the TF diff to 0)

ydshieh

left 3 nit question but yeah thank you!

I see a few files have style changes: I always think they should be fixed in a separate PR, but you can keep them here if you wish.

gante · 2025-02-13T11:49:45Z

@ydshieh all comments addressed 🤗

ydshieh

Thank you again.

Just to make sure the change in tests/test_modeling_tf_common.py is what you expect:

removing all_generative_model_classes = () OK
not adding def all_generative_model_classes(self): : if this is intended, OK, for you to check

ydshieh · 2025-02-13T12:04:36Z

Thank you again.

Just to make sure the change in tests/test_modeling_tf_common.py is what you expect:

removing all_generative_model_classes = () OK

not adding def all_generative_model_classes(self): : if this is intended, OK, for you to check

ah , I see you put it back :-), all good then

zucchini-nlp

Thanks, this hopefully helps us prevent cases when generation tests are not added.

I am not sure I got why audio models generation is turned off, if it was enabled before and CI was green? Same question for Blip2

zucchini-nlp · 2025-02-13T12:37:14Z

tests/models/blip_2/test_modeling_blip_2.py

+    # Doesn't run generation tests. TODO: fix generation tests for Blip2ForConditionalGeneration
+    all_generative_model_classes = ()


I think this was working on main for Blip2ForConditionalGeneration, are there many failures?

If we remove this line (= if we don't skip the tests), py.test tests/models/blip_2/test_modeling_blip_2.py rebased on main results in 21 failures :P

I've also double-checked the other models with skips on Monday. Most of them have unique model properties that do not work well with generate

BTW, note that there are two testers (Blip2ForConditionalGenerationDecoderOnlyTest and Blip2ModelTest), the skips are only on the latter. I don't know why the latter needs to skip, but that's beyond the scope of this PR :P

They were also being skipped before.

on main, Blip2ModelTest doesn't have all_generative_model_classes. which means it doesn't run generate tests, and this PR doesn't skip any extra tests for this test class

(I don't know why however)

I see, thanks! I believe something similar for audio model since for me the skip comments weren't very clear

yeah, that's why I ask frequently more (detailed) comments in many PRs I reviewed 😆

ydshieh · 2025-02-13T14:45:43Z

@gante you can ping me to merge if the CI persists to be red but irrelevant to your PR :-)

gante · 2025-02-13T15:23:06Z

@ydshieh yes, please merge 🙏

(was trying to be autonomous :D)

gante changed the title ~~CI: automatically infer generative models~~ CI: avoid human error, automatically infer generative models Aug 30, 2024

gante mentioned this pull request Aug 30, 2024

Llava Onevision: add model #32673

Merged

gante mentioned this pull request Aug 30, 2024

Config: unified logic to retrieve text config #33219

Merged

gante added 6 commits September 20, 2024 16:43

tmp commit

5c650eb

move tests to the right class

dd75807

remove ALL all_generative_model_classes = ...

dfb8bae

skip tf roberta

eef5651

skip InstructBlipForConditionalGenerationDecoderOnlyTest

74e448e

videollava

1e8e794

gante force-pushed the run_all_generate_tests_all_times branch from 489a2d6 to 1e8e794 Compare September 20, 2024 16:45

gante added 4 commits September 20, 2024 16:46

reduce diff

bf7dc7f

reduce diff

8bfe7c8

remove on vlms

0382898

fix a few more

441756e

This was referenced Sep 23, 2024

Generation tests: update imagegpt input name, remove unused functions #33663

Merged

Generate tests: modality-agnostic input preparation #33685

Merged

gante and others added 7 commits October 7, 2024 11:50

Merge branch 'main' into run_all_generate_tests_all_times

a77d1d9

Merge branch 'main' into run_all_generate_tests_all_times

944a715

Merge branch 'main' into run_all_generate_tests_all_times

ad80364

manual rebase bits

d83adcd

more manual rebase

8331afd

remove all manual generative model class test entries

fec48e4

fix up to ernie

92209fc

This was referenced Jan 31, 2025

[generation] automatic compilation fixes: don't log when output_attentions is True #35989

Closed

[generate] shape checks in tests compatible with fixed-length caches (+ some minor fixes) #35993

Merged

gante and others added 3 commits February 10, 2025 17:51

Merge branch 'main' into run_all_generate_tests_all_times

185348c

a few more removals

9559220

handle remaining cases

f7d8f0b

Merge branch 'main' into run_all_generate_tests_all_times

17b654c

gante requested review from ydshieh and zucchini-nlp February 11, 2025 14:24

gante marked this pull request as ready for review February 11, 2025 14:24

gante and others added 3 commits February 12, 2025 11:04

Merge branch 'main' into run_all_generate_tests_all_times

3879c91

Merge branch 'main' into run_all_generate_tests_all_times

1d2a574

make fixup

0452e08

ydshieh reviewed Feb 12, 2025

View reviewed changes

tests/models/musicgen/test_modeling_musicgen.py Outdated Show resolved Hide resolved

ydshieh reviewed Feb 12, 2025

View reviewed changes

tests/models/pop2piano/test_modeling_pop2piano.py Outdated Show resolved Hide resolved

ydshieh reviewed Feb 12, 2025

View reviewed changes

gante added 4 commits February 13, 2025 11:31

better comments for test skips

d4b5f7d

revert tf changes

36d61cd

remove empty line removal

7fa987e

one more

a3cad24

missing one

2e39ac3

ydshieh approved these changes Feb 13, 2025

View reviewed changes

zucchini-nlp approved these changes Feb 13, 2025

View reviewed changes

gante added 4 commits February 13, 2025 12:55

Merge branch 'main' into run_all_generate_tests_all_times

1ab449e

Merge branch 'main' into run_all_generate_tests_all_times

e07aff1

Merge branch 'main' into run_all_generate_tests_all_times

a223518

Merge branch 'main' into run_all_generate_tests_all_times

0a913a0

ydshieh merged commit 62c7ea0 into huggingface:main Feb 13, 2025
23 of 25 checks passed

gante deleted the run_all_generate_tests_all_times branch February 13, 2025 15:54

gante mentioned this pull request Feb 13, 2025

[CI] Check test if the GenerationTesterMixin inheritance is correct 🐛 🔫 #36180

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: avoid human error, automatically infer generative models #33212

CI: avoid human error, automatically infer generative models #33212

gante commented Aug 30, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Aug 30, 2024

zucchini-nlp commented Aug 30, 2024

gante commented Feb 11, 2025

ydshieh Feb 12, 2025

gante Feb 13, 2025

ydshieh left a comment

gante commented Feb 13, 2025

ydshieh left a comment

ydshieh commented Feb 13, 2025

zucchini-nlp left a comment

zucchini-nlp Feb 13, 2025

gante Feb 13, 2025

gante Feb 13, 2025

ydshieh Feb 13, 2025

zucchini-nlp Feb 13, 2025

ydshieh Feb 13, 2025

ydshieh commented Feb 13, 2025

gante commented Feb 13, 2025

		# Doesn't run generation tests. TODO: fix generation tests for Blip2ForConditionalGeneration
		all_generative_model_classes = ()

CI: avoid human error, automatically infer generative models #33212

CI: avoid human error, automatically infer generative models #33212

Conversation

gante commented Aug 30, 2024 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Aug 30, 2024

zucchini-nlp commented Aug 30, 2024

gante commented Feb 11, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ydshieh left a comment

Choose a reason for hiding this comment

gante commented Feb 13, 2025

ydshieh left a comment

Choose a reason for hiding this comment

ydshieh commented Feb 13, 2025

zucchini-nlp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ydshieh commented Feb 13, 2025

gante commented Feb 13, 2025

gante commented Aug 30, 2024 •

edited

Loading