-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
owlvit/2 dynamic input resolution. #34764
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @bastrob! Thanks for opening PR!
It would be great to enable any image resolution, but If you will not find a way to manage height != width
images we can limit it to a square input image size. We just have to make sure the proper error is raised.
Yes, Im agree it makes sense to enable any image resolution, i just wanted to clarify this point. I will work on it, thanks for your feedback ! |
30f3c2d
to
3a8b0d7
Compare
Hi @qubvel, I pushed a version to handle image size at any resolution, what do you think about it ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @bastrob, thanks for digging into it and adding tests! Overall looks great 🚀 just added some nits regarding variable naming.
Could you please also push an empty commit with the message[run-slow] owlvit, owlv2
to trigger slow tests (this commit should be the last one, so I can approve the CI run). Thanks!
eca2c7b
to
06c8af8
Compare
If it sounds good to you @qubvel, Im ready to push the last commit :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @bastrob, thanks for the update! Please, see the comments below 🤗
4f894c4
to
2e42250
Compare
Hello @qubvel, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good to me, just a few notes re the style! Also tested locally and predictions look good 👍
can you please also push empty commit with the message [run-slow] owlvit, owlv2
to trigger slow tests for these models in CI? (it should be the last commit at the moment I approve the run of CI)
2e42250
to
602e5f0
Compare
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for handling this non-standard case! Looks great to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super good that things are copied from one another! And thanks for also adding the tests. I am usually not a fan of adding codepaths, so if there is a way to avoid checking if interpolate pos encoding at 4 differents places, to just do it when we actually do the interpolation, it would be a lot better!
We can merge whatsoever 🤗
What does this PR do?
Towards #30579
Hey, this is a draft, Im wondering how can we manage the variables impacted by the dynamic input change in the OwlViTForObjectDetection class (self.sqrt_num_patches, self.box_bias) ?
Is there a better way to handle this ?
The interpolate_pos_encoding allows new input size respecting height==width strictly ?
In that case I should ensure this.
If not, I think sqrt_num_patches needs to be decomposed too (_h, _w), some examples where it might throws exc:
https://github.com/bastrob/transformers/blob/30f3c2d56729974ec0d1d9e2fc4fd633ab697eb2/src/transformers/models/owlvit/modeling_owlvit.py#L1355
https://github.com/bastrob/transformers/blob/30f3c2d56729974ec0d1d9e2fc4fd633ab697eb2/src/transformers/models/owlvit/modeling_owlvit.py#L1459
https://github.com/bastrob/transformers/blob/30f3c2d56729974ec0d1d9e2fc4fd633ab697eb2/src/transformers/models/owlvit/modeling_owlvit.py#L1719
Fixes: #34622
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@amyeroberts
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.