Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix number of patch check for different vision feature select strategy #32494

Conversation

insujang
Copy link
Contributor

@insujang insujang commented Aug 7, 2024

What does this PR do?

Fixes #32395 (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

I think a new test for this change needs to be added (using parameterized to test different vision_feature_select_strategy and vision_feature_layer), however, I have no idea how the new test would look like. With this change there will be no errors during forward, but do we also need to check the outputs with expected ones? Which test class would be suitable to have this test? @zucchini-nlp Appreciate it if you could share your thoughts regarding this. Thanks!

Who can review?

@zucchini-nlp

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks for working on this!

Yes, we'd need tests and you can add them in

class LlavaNextForConditionalGenerationModelTest(ModelTesterMixin, GenerationTesterMixin, unittest.TestCase):

Basically we'll run a tiny model with dummy inputs, and verify that changing select_strategy run without errors. I don't think we need slow tests here, just make sure after adding the test to run

RUN_SLOW=1 pytest tests/models/llava_next/test_modeling_llava_next.py

and check that all tests pass. If something doesn't pass, just let me know. Might be numerical precision error from different hardware as the PR doesn't change the model logic

@zucchini-nlp
Copy link
Member

@insujang hey, is the PR in progress or do you need help with adding tests? I can do the test part if you are busy :)

@insujang
Copy link
Contributor Author

Hi @zucchini-nlp , I am sorry but I am too busy to work on it :( I would sincerely appreciate it if you could do. Thank you!

@zucchini-nlp
Copy link
Member

@insujang no worries, thanks a lot for contribution. Will add a test soon and merge it to main 🤗

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, LGTM! ❤️

I added a test as @insujang is busy and now this can be reviewed

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to merge if you disapprove the comment below

@@ -648,7 +648,7 @@ def _merge_input_ids_with_image_features(

return final_embedding, final_attention_mask, position_ids, final_labels, final_input_ids

def pack_image_features(self, image_features, image_sizes, image_newline=None):
def pack_image_features(self, image_features, image_sizes, vision_feature_select_strategy, image_newline=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it have "default" as a default kwarg value so as to not change the signature and keep usages of pack_image_features outside of the model working?

(Not necessary if this method should be considered internal)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, should be internal as it doesn't make much sense outside of LLaVA

@zucchini-nlp zucchini-nlp merged commit bcf8946 into huggingface:main Sep 17, 2024
16 checks passed
BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024
huggingface#32494)

* Fix number of patch check for different vision feature select strategy

* add test

---------

Co-authored-by: raushan <raushan@huggingface.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Llava-Next with vision_feature_select_strategy == "full" returns error image size mismatch
3 participants