Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VLM generate: tests can't generate image/video tokens #33623

Merged
merged 2 commits into from
Sep 20, 2024

Conversation

gante
Copy link
Member

@gante gante commented Sep 20, 2024

What does this PR do?

Our VLM generate mixin tests, introduced in #33533, are flaky. Because they use randomly initialized models, nothing prevents the models from generating image/video tokens, which a) shouldn't happen b) crash the forward pass.

This PR ensure our generation tests don't generate those tokens.

Commands ran to ensure the issue is fixed:
✅ (the test is no longer flaky) py.test tests/models/video_llava/test_modeling_video_llava.py::VideoLlavaForConditionalGenerationModelTest::test_sample_generate_dict_output --flake-finder --flake-runs=1000
✅ (we can run generation tests on all models after these changes) py.test tests/models/ -k test_sample_generate_dict_output

cc @zucchini-nlp -- when you're back from holidays, plz review even if it is already merged, in case you want to change things 🤗

@gante gante requested a review from ydshieh September 20, 2024 14:00
@gante gante changed the title VLM generate: curb flaky tests VLM generate: tests can't generate image/video tokens Sep 20, 2024
Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works for me, although I think we probably don't need this condition

image_token_index < config.get_text_config().vocab_size

i.e. just always add those 2 token to bad word

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@gante
Copy link
Member Author

gante commented Sep 20, 2024

@ydshieh we do, llava_onevision fails without it :o (because the token in question is outside than the vocab size in the test settings)

@gante gante merged commit 2fdb5e7 into huggingface:main Sep 20, 2024
19 checks passed
@gante gante deleted the flaky_vlm_tests branch September 20, 2024 14:43
@zucchini-nlp
Copy link
Member

@gante thanks so much for handling this in my absence ❤️ I saw your other comment, and yes, if we can make a default generation config for models, then we'll be able to block certain tokens. For that I think we should also allow passing/overwriting generation params when loading a config fsame way we can overwrite some ModelConfig params

llava_onevision fails without it :o (because the token in question is outside than the vocab size in the test settings)

This is interesting because all llavas were/should be desinged in such a way that image tokens are part of embeddings' vocab size and we should be able to embed these image tokens. LMK check why llava-onevision fails, might need to fix tests if they are designed incorrectly

@gante
Copy link
Member Author

gante commented Sep 24, 2024

thanks so much for handling this in my absence

hehe no worries -- I would expect the reverse to be true if I was off 😉

I'm starting to see this pattern many times, where it would help if we could define a generation config specific to the model in question. I'll bump it up in my inner mapping of priorities :)

BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants