Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

owlvit/2 dynamic input resolution. #34764

Merged
merged 8 commits into from
Dec 21, 2024

Conversation

bastrob
Copy link
Contributor

@bastrob bastrob commented Nov 17, 2024

What does this PR do?

Towards #30579
Hey, this is a draft, Im wondering how can we manage the variables impacted by the dynamic input change in the OwlViTForObjectDetection class (self.sqrt_num_patches, self.box_bias) ?
Is there a better way to handle this ?

The interpolate_pos_encoding allows new input size respecting height==width strictly ?
In that case I should ensure this.
If not, I think sqrt_num_patches needs to be decomposed too (_h, _w), some examples where it might throws exc:
https://github.com/bastrob/transformers/blob/30f3c2d56729974ec0d1d9e2fc4fd633ab697eb2/src/transformers/models/owlvit/modeling_owlvit.py#L1355
https://github.com/bastrob/transformers/blob/30f3c2d56729974ec0d1d9e2fc4fd633ab697eb2/src/transformers/models/owlvit/modeling_owlvit.py#L1459
https://github.com/bastrob/transformers/blob/30f3c2d56729974ec0d1d9e2fc4fd633ab697eb2/src/transformers/models/owlvit/modeling_owlvit.py#L1719

Fixes: #34622

Before submitting

Who can review?

@amyeroberts
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Copy link
Member

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bastrob! Thanks for opening PR!

It would be great to enable any image resolution, but If you will not find a way to manage height != width images we can limit it to a square input image size. We just have to make sure the proper error is raised.

src/transformers/models/owlv2/modeling_owlv2.py Outdated Show resolved Hide resolved
src/transformers/models/owlv2/modeling_owlv2.py Outdated Show resolved Hide resolved
@bastrob
Copy link
Contributor Author

bastrob commented Nov 18, 2024

Yes, Im agree it makes sense to enable any image resolution, i just wanted to clarify this point. I will work on it, thanks for your feedback !

@bastrob bastrob force-pushed the feature/dynamic_interp branch from 30f3c2d to 3a8b0d7 Compare November 19, 2024 22:48
@bastrob
Copy link
Contributor Author

bastrob commented Nov 19, 2024

Hi @qubvel, I pushed a version to handle image size at any resolution, what do you think about it ?

Copy link
Member

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bastrob, thanks for digging into it and adding tests! Overall looks great 🚀 just added some nits regarding variable naming.

Could you please also push an empty commit with the message[run-slow] owlvit, owlv2 to trigger slow tests (this commit should be the last one, so I can approve the CI run). Thanks!

src/transformers/models/owlv2/modeling_owlv2.py Outdated Show resolved Hide resolved
src/transformers/models/owlv2/modeling_owlv2.py Outdated Show resolved Hide resolved
src/transformers/models/owlv2/modeling_owlv2.py Outdated Show resolved Hide resolved
@bastrob bastrob force-pushed the feature/dynamic_interp branch from eca2c7b to 06c8af8 Compare November 20, 2024 21:02
@bastrob
Copy link
Contributor Author

bastrob commented Nov 20, 2024

If it sounds good to you @qubvel, Im ready to push the last commit :)

Copy link
Member

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bastrob, thanks for the update! Please, see the comments below 🤗

src/transformers/models/owlv2/modeling_owlv2.py Outdated Show resolved Hide resolved
@bastrob bastrob force-pushed the feature/dynamic_interp branch from 4f894c4 to 2e42250 Compare November 21, 2024 22:19
@bastrob
Copy link
Contributor Author

bastrob commented Nov 21, 2024

Hello @qubvel, box_predictor now takes interpolate_pos_encoding as extra parameter to enable the dynamic computation of box_bias inside.
Plus, I've added # copied from statement.

@qubvel qubvel self-requested a review November 22, 2024 15:32
Copy link
Member

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good to me, just a few notes re the style! Also tested locally and predictions look good 👍

can you please also push empty commit with the message [run-slow] owlvit, owlv2 to trigger slow tests for these models in CI? (it should be the last commit at the moment I approve the run of CI)

src/transformers/models/owlvit/modeling_owlvit.py Outdated Show resolved Hide resolved
src/transformers/models/owlvit/modeling_owlvit.py Outdated Show resolved Hide resolved
src/transformers/models/owlvit/modeling_owlvit.py Outdated Show resolved Hide resolved
@bastrob bastrob force-pushed the feature/dynamic_interp branch from 2e42250 to 602e5f0 Compare November 22, 2024 22:10
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for handling this non-standard case! Looks great to me!

@qubvel qubvel requested a review from ArthurZucker November 25, 2024 12:28
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super good that things are copied from one another! And thanks for also adding the tests. I am usually not a fan of adding codepaths, so if there is a way to avoid checking if interpolate pos encoding at 4 differents places, to just do it when we actually do the interpolation, it would be a lot better!
We can merge whatsoever 🤗

@qubvel qubvel merged commit 8f38f58 into huggingface:main Dec 21, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Question on OWLv2 Model Input Size Flexibility in Hugging Face
4 participants