VLM generate: tests can't generate image/video tokens #33623

gante · 2024-09-20T13:59:58Z

What does this PR do?

Our VLM generate mixin tests, introduced in #33533, are flaky. Because they use randomly initialized models, nothing prevents the models from generating image/video tokens, which a) shouldn't happen b) crash the forward pass.

This PR ensure our generation tests don't generate those tokens.

Commands ran to ensure the issue is fixed:
✅ (the test is no longer flaky) py.test tests/models/video_llava/test_modeling_video_llava.py::VideoLlavaForConditionalGenerationModelTest::test_sample_generate_dict_output --flake-finder --flake-runs=1000
✅ (we can run generation tests on all models after these changes) py.test tests/models/ -k test_sample_generate_dict_output

cc @zucchini-nlp -- when you're back from holidays, plz review even if it is already merged, in case you want to change things 🤗

ydshieh

works for me, although I think we probably don't need this condition

image_token_index < config.get_text_config().vocab_size

i.e. just always add those 2 token to bad word

HuggingFaceDocBuilderDev · 2024-09-20T14:25:15Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

gante · 2024-09-20T14:27:21Z

@ydshieh we do, llava_onevision fails without it :o (because the token in question is outside than the vocab size in the test settings)

zucchini-nlp · 2024-09-23T12:51:11Z

@gante thanks so much for handling this in my absence ❤️ I saw your other comment, and yes, if we can make a default generation config for models, then we'll be able to block certain tokens. For that I think we should also allow passing/overwriting generation params when loading a config fsame way we can overwrite some ModelConfig params

llava_onevision fails without it :o (because the token in question is outside than the vocab size in the test settings)

This is interesting because all llavas were/should be desinged in such a way that image tokens are part of embeddings' vocab size and we should be able to embed these image tokens. LMK check why llava-onevision fails, might need to fix tests if they are designed incorrectly

gante · 2024-09-24T18:08:03Z

thanks so much for handling this in my absence

hehe no worries -- I would expect the reverse to be true if I was off 😉

I'm starting to see this pattern many times, where it would help if we could define a generation config specific to the model in question. I'll bump it up in my inner mapping of priorities :)

)

curb flaky tests

4b35ed5

gante requested a review from ydshieh September 20, 2024 14:00

gante mentioned this pull request Sep 20, 2024

Fix missing test in torch_job #33593

Merged

gante changed the title ~~VLM generate: curb flaky tests~~ VLM generate: tests can't generate image/video tokens Sep 20, 2024

ydshieh approved these changes Sep 20, 2024

View reviewed changes

this one is flaky, gets tagged as so

5b6a5cf

gante merged commit 2fdb5e7 into huggingface:main Sep 20, 2024
19 checks passed

gante deleted the flaky_vlm_tests branch September 20, 2024 14:43

BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024

VLM generate: tests can't generate image/video tokens (huggingface#33623

f1a417e

)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VLM generate: tests can't generate image/video tokens #33623

VLM generate: tests can't generate image/video tokens #33623

gante commented Sep 20, 2024

ydshieh left a comment

HuggingFaceDocBuilderDev commented Sep 20, 2024

gante commented Sep 20, 2024

zucchini-nlp commented Sep 23, 2024

gante commented Sep 24, 2024

VLM generate: tests can't generate image/video tokens #33623

VLM generate: tests can't generate image/video tokens #33623

Conversation

gante commented Sep 20, 2024

What does this PR do?

ydshieh left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Sep 20, 2024

gante commented Sep 20, 2024

zucchini-nlp commented Sep 23, 2024

gante commented Sep 24, 2024