-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VLM generate: tests can't generate image/video tokens #33623
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
works for me, although I think we probably don't need this condition
image_token_index < config.get_text_config().vocab_size
i.e. just always add those 2 token to bad word
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@ydshieh we do, |
@gante thanks so much for handling this in my absence ❤️ I saw your other comment, and yes, if we can make a default generation config for models, then we'll be able to block certain tokens. For that I think we should also allow passing/overwriting generation params when loading a config fsame way we can overwrite some ModelConfig params
This is interesting because all llavas were/should be desinged in such a way that image tokens are part of embeddings' vocab size and we should be able to embed these image tokens. LMK check why llava-onevision fails, might need to fix tests if they are designed incorrectly |
hehe no worries -- I would expect the reverse to be true if I was off 😉 I'm starting to see this pattern many times, where it would help if we could define a generation config specific to the model in question. I'll bump it up in my inner mapping of priorities :) |
What does this PR do?
Our VLM generate mixin tests, introduced in #33533, are flaky. Because they use randomly initialized models, nothing prevents the models from generating image/video tokens, which a) shouldn't happen b) crash the forward pass.
This PR ensure our generation tests don't generate those tokens.
Commands ran to ensure the issue is fixed:
✅ (the test is no longer flaky)
py.test tests/models/video_llava/test_modeling_video_llava.py::VideoLlavaForConditionalGenerationModelTest::test_sample_generate_dict_output --flake-finder --flake-runs=1000
✅ (we can run generation tests on all models after these changes)
py.test tests/models/ -k test_sample_generate_dict_output
cc @zucchini-nlp -- when you're back from holidays, plz review even if it is already merged, in case you want to change things 🤗