-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OmDet-Turbo #31843
Add OmDet-Turbo #31843
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Hey @amyeroberts @qubvel !
Longer-terms TODOs:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening this PR!
Overall looks good for a first draft! I've just done a fairly high-level review pointing out some library patterns and structures to be propagated throughout the rest of the PR.
WRT to training, if the loss function isn't available or it's hard to add into the library, it's OK to add this at a later stage if requested by the community
src/transformers/models/omdet_turbo/convert_omdet_turbo_to_hf.py
Outdated
Show resolved
Hide resolved
src/transformers/models/omdet_turbo/convert_omdet_turbo_to_hf.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job!
@amyeroberts did a review already (I also learned new things from it 🙂 ), so I just added a few more comments below.
In terms of draft PR reviews would be probably helpful to specify which files should be reviewed and which are not ready yet.
src/transformers/models/omdet_turbo/convert_omdet_turbo_to_hf.py
Outdated
Show resolved
Hide resolved
src/transformers/models/omdet_turbo/convert_omdet_turbo_to_hf.py
Outdated
Show resolved
Hide resolved
src/transformers/models/omdet_turbo/convert_omdet_turbo_to_hf.py
Outdated
Show resolved
Hide resolved
src/transformers/models/omdet_turbo/convert_omdet_turbo_to_hf.py
Outdated
Show resolved
Hide resolved
src/transformers/models/omdet_turbo/image_processing_omdet_turbo.py
Outdated
Show resolved
Hide resolved
Hi @amyeroberts and @qubvel! When you have some time, could you please take another look at this PR? I've resolved your previous remarks and left the ones where I had questions open. All the files I've modified are ready for review. I've commented out the modeling tests I haven't addressed yet (there's only one end-to-end integration test for now), but the processor tests are ready for review. I'd like to highlight a significant issue: the Thanks in advance! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few more comments regarding modeling file. Forgot to turn on review mode in VSCode, so the upper comments go as separate comments, sorry for that :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating!
As before - I've done another high-level pass, but not everything as review was getting quite long. A few general comments:
- We should use as clear and explicit variable names as possible, both within code and in comments e.g.
bs
->batch_size
- Code should be self-documenting as much as possible: we should try to avoid needing to refer back to docstrings and comments to figure out what's happening. Clear variable names will help with this
- There are specific cases for needing static methods, but not requiring self isn't a sufficient condition. This is normally an indication the method should just be a function outside the class.
- +1 to all of @qubvel's comments - in particular in relation to the attention implementation
- Make sure to refer to other model's implementations (not just Grounding DINO) to see the library patterns for e.g. naming, position of objects in the file etc.
Next steps are adding the model tests and addressing all the outstanding comments. With regards to reviewers' comments, it's important to fully address before marking as resolved (apply the change/ reply why it's not applicable / ask follow-up questions)
output_kwargs = self._merge_kwargs( | ||
OmDetTurboProcessorKwargs, | ||
tokenizer_init_kwargs=OmDetTurboProcessorKwargs._defaults["text_kwargs"], | ||
# tokenizer_init_kwargs=self.tokenizer.init_kwargs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to delete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left this here because it should merge the tokenizer init_kwargs
to be consistent with other processors, but as we discussed, when I load the model from a saved checkpoint, the tokenizer init_kwargs
contains kwargs that shouldn't be there such as padding_side
etc. I'm really not sure what causes this as the checkpoint is saved and loaded similarly to other models.
src/transformers/models/omdet_turbo/configuration_omdet_turbo.py
Outdated
Show resolved
Hide resolved
src/transformers/models/omdet_turbo/configuration_omdet_turbo.py
Outdated
Show resolved
Hide resolved
src/transformers/models/omdet_turbo/configuration_omdet_turbo.py
Outdated
Show resolved
Hide resolved
src/transformers/models/omdet_turbo/configuration_omdet_turbo.py
Outdated
Show resolved
Hide resolved
src/transformers/models/omdet_turbo/configuration_omdet_turbo.py
Outdated
Show resolved
Hide resolved
9ceb5fc
to
d52c345
Compare
c238cb6
to
40eaa8f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yonigozlan Thanks for the update, in general looks good to me, the code is easier to read and understand! I left some nit comments regarding variable naming, type hints, and the attention mask. And the main concern is regarding removing OmDetTurboModel
, I suppose we have to have FooModel
class for every model, and the idea is to run this model without heads, then add specific heads inside the specific model class, such as OmDetTurboForObjectDetection
.
Thanks for the review @qubvel ! For the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!
I think we're close to being ready for merge 🤗 Most comments are about making sure we're using clear variable names.
General comments:
- It looks like there's no logic to account for gradient checkpointing e.g. like here which would be nice to support if possible
- For the tests - in particular processing and one inference test - it would be good to account for batched inputs. This is something we often forget to test for and particularly for models like this which are more complicated than just input_ids we want to make sure it all works!
- Make sure docstrings are in sync with the code they document and in-line with the library patterns.
decoder_coord_logits: torch.FloatTensor = None | ||
decoder_class_logits: torch.FloatTensor = None | ||
encoder_coord_logits: torch.FloatTensor = None | ||
encoder_class_logits: Tuple[torch.FloatTensor] = None | ||
init_reference_points: torch.FloatTensor = None | ||
intermediate_reference_points: Tuple[Tuple[torch.FloatTensor]] = None | ||
hidden_states: Optional[Tuple[torch.FloatTensor]] = None | ||
attentions: Optional[Tuple[Tuple[torch.FloatTensor]]] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently I cannot have more than one required field if I inherit from ModelOutput and use the dataclass decorator. @amyeroberts do you know the reason for that? And what should I do in this case? Thanks
Could you share a code snippet and error? I seem to be able to create a model output with more than required input
In [1]: from transformers.utils import ModelOutput
In [2]: class FooOutput(ModelOutput):
...: a: int
...: b: int
...:
One thing to note is that although some inputs aren't typed as optional in our modeling_outputs module, they still default to None
. Not sure why this is the case - the decision was before my time :)
Kwargs: | ||
task (`Union[str, List[str], TextInput, PreTokenizedInput]`): | ||
The grounded text used to guide open vocabulary detection. Expects a single string or a list of strings. | ||
Examples: "Detect a cat, a dog, and a bird.",[ "Detect everything.", "Detect trees and flowers."] | ||
When not provided, the default task is "Detect [class1], [class2], [class3]" etc. | ||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed task from the Args section of the docstring, and added this Kwargs section, as the task
kwarg is very specific to this model, and I figured it would be nice to have a description of what it does here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When not provided, the default task is "Detect [class1], [class2], [class3]" etc.
Is it better than "Detect everything."? Is there any mention in the paper which one the authors used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A mix of prompt format is used during training. It also says in the paper that generic directive like "Detect
all objects" are used in scenarios of large-vocabulary detection. There's no mention of what works best, but in my experience, a non-generic task prompt works better when there is a small number of classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the continued work on this!
It's looking great - I think we're very close to merge. Two main comments are the cache logic and @qubvel's point about the OmDetTurboModel class
src/transformers/models/omdet_turbo/configuration_omdet_turbo.py
Outdated
Show resolved
Hide resolved
|
||
@add_start_docstrings_to_model_forward(OMDET_TURBO_INPUTS_DOCSTRING) | ||
@replace_return_docstrings(output_type=OmDetTurboModelOutput, config_class=_CONFIG_FOR_DOC) | ||
def forward( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry - I missed this as it was marked as resolved.
but all models should have FooModel
Generally, yes, we want this. It's not a hard-and-fast rule though, other models like Fuyu have just a task-specific model.
However, yes, you're right in this case - as there's decoder heads we can and should separate this out in order to have an OmDetTurboModel
0d9d882
to
66ef0b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks good to me - thanks for all the work adding this model!
Let's make sure to remove the [WIP] from the title before merge
self.output_attentions = config.output_attentions | ||
self.output_hidden_states = config.output_hidden_states | ||
self.use_return_dict = config.use_return_dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to store these in the model - the typical pattern is to store self.config in the publically importable classes and use output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
self.output_attentions = config.output_attentions | |
self.output_hidden_states = config.output_hidden_states | |
self.use_return_dict = config.use_return_dict |
* Add template with add-new-model-like * Add rough OmDetTurboEncoder and OmDetTurboDecoder * Add working OmDetTurbo convert to hf * Change OmDetTurbo encoder to RT-DETR encoder * Add swin timm backbone as default, add always partition fix for swin timm * Add labels and tasks caching * Fix make fix-copies * Format omdet_turbo * fix Tokenizer tests * Fix style and quality * Reformat omdet_turbo * Fix quality, style, copies * Standardize processor kwargs * Fix style * Add output_hidden_states and ouput_attentions * Add personalize multi-head attention, improve docstrings * Add integrated test and fix copy, style, quality * Fix unprotected import * Cleanup comments and fix unprotected imports * Add fix different prompts in batch (key_padding_mask) * Add key_padding_mask to custom multi-head attention module * Replace attention_mask by key_padding_mask * Remove OmDetTurboModel and refactor * Refactor processing of classes and abstract use of timm backbone * Add testing, fix output attentions and hidden states, add cache for anchors generation * Fix copies, style, quality * Add documentation, conver key_padding_mask to attention_mask * revert changes to backbone_utils * Fic docstrings rst * Fix unused argument in config * Fix image link documentation * Reorder config and cleanup * Add tokenizer_init_kwargs in merge_kwargs of the processor * Change AutoTokenizer to CLIPTokenizer in convert * Fix init_weights * Add ProcessorMixin tests, Fix convert while waiting on uniform kwargs * change processor kwargs and make task input optional * Fix omdet docs * Remove unnecessary tests for processor kwargs * Replace nested BatchEncoding output of the processor by a flattened BatchFeature * Make modifications from Pavel review * Add changes Amy review * Remove unused param * Remove normalize_before param, Modify processor call docstring * Remove redundant decoder class, add gradient checkpointing for decoder * Remove commented out code * Fix inference in fp16 and add fp16 integrated test * update omdet md doc * Add OmdetTurboModel * fix caching and nit * add OmDetTurboModel to tests * nit change repeated key test * Improve inference speed in eager mode * fix copies * Fix nit * remove OmdetTurboModel * [run-slow] omdet_turbo * [run-slow] omdet_turbo * skip dataparallel test * [run-slow] omdet_turbo * update weights to new path * remove unnecessary config in class --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-91-248.ec2.internal>
What does this PR do?
This PR adds support for OmDet-Turbo, an open-vocabulary detection model from Om Research Lab.
Who can review?
@amyeroberts @qubvel