Add OmDet-Turbo #31843

yonigozlan · 2024-07-08T15:55:28Z

What does this PR do?

This PR adds support for OmDet-Turbo, an open-vocabulary detection model from Om Research Lab.

Who can review?

HuggingFaceDocBuilderDev · 2024-07-08T16:14:34Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

yonigozlan · 2024-07-17T22:10:34Z

Hey @amyeroberts @qubvel !
I think this PR is ready for a first review. What's missing for now is adding/modifying tests (current ones are for Grounding Dino and not adapted to this model).
Short-terms TODOs left:

Tests
Docs, examples and some docstrings left to modify

Longer-terms TODOs:

Add support for training

amyeroberts

Thanks for opening this PR!

Overall looks good for a first draft! I've just done a fairly high-level review pointing out some library patterns and structures to be propagated throughout the rest of the PR.

WRT to training, if the loss function isn't available or it's hard to add into the library, it's OK to add this at a later stage if requested by the community

docs/source/en/index.md

docs/source/en/model_doc/omdet-turbo.md

src/transformers/models/omdet_turbo/modeling_omdet_turbo.py

src/transformers/models/omdet_turbo/processing_omdet_turbo.py

src/transformers/models/omdet_turbo/convert_omdet_turbo_to_hf.py

src/transformers/models/omdet_turbo/modeling_omdet_turbo.py

qubvel

Good job!

@amyeroberts did a review already (I also learned new things from it 🙂 ), so I just added a few more comments below.

In terms of draft PR reviews would be probably helpful to specify which files should be reviewed and which are not ready yet.

src/transformers/models/omdet_turbo/convert_omdet_turbo_to_hf.py

src/transformers/models/omdet_turbo/image_processing_omdet_turbo.py

src/transformers/models/omdet_turbo/modeling_omdet_turbo.py

yonigozlan · 2024-07-22T22:21:12Z

Hi @amyeroberts and @qubvel! When you have some time, could you please take another look at this PR? I've resolved your previous remarks and left the ones where I had questions open.

All the files I've modified are ready for review. I've commented out the modeling tests I haven't addressed yet (there's only one end-to-end integration test for now), but the processor tests are ready for review.

I'd like to highlight a significant issue: the OmDetTurboMultiheadAttention in the OmDetTurboEncoderLayer module. It reproduces the original model's behavior, but this one appears to be incorrectly implemented for batch inference. This issue makes the whole model unusable for batch inference if we want to use the original model weights and be consistent with the original model outputs.

Thanks in advance!

src/transformers/models/omdet_turbo/modeling_omdet_turbo.py

qubvel

Added a few more comments regarding modeling file. Forgot to turn on review mode in VSCode, so the upper comments go as separate comments, sorry for that :)

src/transformers/models/omdet_turbo/modeling_omdet_turbo.py

amyeroberts

Thanks for iterating!

As before - I've done another high-level pass, but not everything as review was getting quite long. A few general comments:

We should use as clear and explicit variable names as possible, both within code and in comments e.g. bs -> batch_size
Code should be self-documenting as much as possible: we should try to avoid needing to refer back to docstrings and comments to figure out what's happening. Clear variable names will help with this
There are specific cases for needing static methods, but not requiring self isn't a sufficient condition. This is normally an indication the method should just be a function outside the class.
+1 to all of @qubvel's comments - in particular in relation to the attention implementation
Make sure to refer to other model's implementations (not just Grounding DINO) to see the library patterns for e.g. naming, position of objects in the file etc.

Next steps are adding the model tests and addressing all the outstanding comments. With regards to reviewers' comments, it's important to fully address before marking as resolved (apply the change/ reply why it's not applicable / ask follow-up questions)

amyeroberts · 2024-07-29T18:05:15Z

src/transformers/models/omdet_turbo/processing_omdet_turbo.py

+        output_kwargs = self._merge_kwargs(
+            OmDetTurboProcessorKwargs,
+            tokenizer_init_kwargs=OmDetTurboProcessorKwargs._defaults["text_kwargs"],
+            # tokenizer_init_kwargs=self.tokenizer.init_kwargs,


I left this here because it should merge the tokenizer init_kwargs to be consistent with other processors, but as we discussed, when I load the model from a saved checkpoint, the tokenizer init_kwargs contains kwargs that shouldn't be there such as padding_side etc. I'm really not sure what causes this as the checkpoint is saved and loaded similarly to other models.

src/transformers/models/omdet_turbo/processing_omdet_turbo.py

src/transformers/models/omdet_turbo/modeling_omdet_turbo.py

docs/source/en/model_doc/omdet-turbo.md

src/transformers/models/omdet_turbo/processing_omdet_turbo.py

src/transformers/models/omdet_turbo/configuration_omdet_turbo.py

qubvel

@yonigozlan Thanks for the update, in general looks good to me, the code is easier to read and understand! I left some nit comments regarding variable naming, type hints, and the attention mask. And the main concern is regarding removing OmDetTurboModel, I suppose we have to have FooModel class for every model, and the idea is to run this model without heads, then add specific heads inside the specific model class, such as OmDetTurboForObjectDetection.

docs/source/en/model_doc/omdet-turbo.md

src/transformers/models/omdet_turbo/processing_omdet_turbo.py

src/transformers/models/omdet_turbo/modeling_omdet_turbo.py

tests/models/omdet_turbo/test_modeling_omdet_turbo.py

yonigozlan · 2024-08-16T18:42:13Z

Thanks for the review @qubvel ! For the OmdetTurboModel, the task specific part of the model starts at the very beginning of the decoder, where there are two heads defined, one for the object detection scores and the other for the bboxes coordinates. So it is quite difficult to define a non task-specific OmdetTurboModel. It wouldn't be the first model not to have a FooModel, for example LLaVaNext doesn't have one either, maybe for the same reason?
@amyeroberts if you think not having OmdetTurboModel is ok, I think this should be ready for you to review! Sorry I probably pinged you for a final review too early before, but I'm hopeful it's better this time :)

amyeroberts

Looking good!

I think we're close to being ready for merge 🤗 Most comments are about making sure we're using clear variable names.

General comments:

It looks like there's no logic to account for gradient checkpointing e.g. like here which would be nice to support if possible
For the tests - in particular processing and one inference test - it would be good to account for batched inputs. This is something we often forget to test for and particularly for models like this which are more complicated than just input_ids we want to make sure it all works!
Make sure docstrings are in sync with the code they document and in-line with the library patterns.

src/transformers/models/omdet_turbo/modeling_omdet_turbo.py

amyeroberts · 2024-08-16T11:29:59Z

src/transformers/models/omdet_turbo/modeling_omdet_turbo.py

+    decoder_coord_logits: torch.FloatTensor = None
+    decoder_class_logits: torch.FloatTensor = None
+    encoder_coord_logits: torch.FloatTensor = None
+    encoder_class_logits: Tuple[torch.FloatTensor] = None
+    init_reference_points: torch.FloatTensor = None
+    intermediate_reference_points: Tuple[Tuple[torch.FloatTensor]] = None
+    hidden_states: Optional[Tuple[torch.FloatTensor]] = None
+    attentions: Optional[Tuple[Tuple[torch.FloatTensor]]] = None


Apparently I cannot have more than one required field if I inherit from ModelOutput and use the dataclass decorator. @amyeroberts do you know the reason for that? And what should I do in this case? Thanks

Could you share a code snippet and error? I seem to be able to create a model output with more than required input

In [1]: from transformers.utils import ModelOutput In [2]: class FooOutput(ModelOutput): ...: a: int ...: b: int ...:

One thing to note is that although some inputs aren't typed as optional in our modeling_outputs module, they still default to None. Not sure why this is the case - the decision was before my time :)

src/transformers/models/omdet_turbo/modeling_omdet_turbo.py

docs/source/en/model_doc/omdet-turbo.md

src/transformers/models/omdet_turbo/modeling_omdet_turbo.py

yonigozlan · 2024-08-22T14:29:53Z

src/transformers/models/omdet_turbo/processing_omdet_turbo.py

+        Kwargs:
+            task (`Union[str, List[str], TextInput, PreTokenizedInput]`):
+                The grounded text used to guide open vocabulary detection. Expects a single string or a list of strings.
+                Examples: "Detect a cat, a dog, and a bird.",[ "Detect everything.", "Detect trees and flowers."]
+                When not provided, the default task is "Detect [class1], [class2], [class3]" etc.
+            ...


I removed task from the Args section of the docstring, and added this Kwargs section, as the task kwarg is very specific to this model, and I figured it would be nice to have a description of what it does here.

When not provided, the default task is "Detect [class1], [class2], [class3]" etc.

Is it better than "Detect everything."? Is there any mention in the paper which one the authors used?

A mix of prompt format is used during training. It also says in the paper that generic directive like "Detect
all objects" are used in scenarios of large-vocabulary detection. There's no mention of what works best, but in my experience, a non-generic task prompt works better when there is a small number of classes.

amyeroberts

Thanks for all the continued work on this!

It's looking great - I think we're very close to merge. Two main comments are the cache logic and @qubvel's point about the OmDetTurboModel class

src/transformers/models/omdet_turbo/configuration_omdet_turbo.py

src/transformers/models/omdet_turbo/modeling_omdet_turbo.py

docs/source/en/model_doc/omdet-turbo.md

amyeroberts · 2024-09-05T16:51:16Z

src/transformers/models/omdet_turbo/modeling_omdet_turbo.py

+
+    @add_start_docstrings_to_model_forward(OMDET_TURBO_INPUTS_DOCSTRING)
+    @replace_return_docstrings(output_type=OmDetTurboModelOutput, config_class=_CONFIG_FOR_DOC)
+    def forward(


Sorry - I missed this as it was marked as resolved.

but all models should have FooModel

Generally, yes, we want this. It's not a hard-and-fast rule though, other models like Fuyu have just a task-specific model.

However, yes, you're right in this case - as there's decoder heads we can and should separate this out in order to have an OmDetTurboModel

…atchFeature

amyeroberts

All looks good to me - thanks for all the work adding this model!

Let's make sure to remove the [WIP] from the title before merge

amyeroberts · 2024-09-24T20:36:56Z

src/transformers/models/omdet_turbo/modeling_omdet_turbo.py

+        self.output_attentions = config.output_attentions
+        self.output_hidden_states = config.output_hidden_states
+        self.use_return_dict = config.use_return_dict


We don't need to store these in the model - the typical pattern is to store self.config in the publically importable classes and use output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions

Suggested change

self.output_attentions = config.output_attentions

self.output_hidden_states = config.output_hidden_states

self.use_return_dict = config.use_return_dict

* Add template with add-new-model-like * Add rough OmDetTurboEncoder and OmDetTurboDecoder * Add working OmDetTurbo convert to hf * Change OmDetTurbo encoder to RT-DETR encoder * Add swin timm backbone as default, add always partition fix for swin timm * Add labels and tasks caching * Fix make fix-copies * Format omdet_turbo * fix Tokenizer tests * Fix style and quality * Reformat omdet_turbo * Fix quality, style, copies * Standardize processor kwargs * Fix style * Add output_hidden_states and ouput_attentions * Add personalize multi-head attention, improve docstrings * Add integrated test and fix copy, style, quality * Fix unprotected import * Cleanup comments and fix unprotected imports * Add fix different prompts in batch (key_padding_mask) * Add key_padding_mask to custom multi-head attention module * Replace attention_mask by key_padding_mask * Remove OmDetTurboModel and refactor * Refactor processing of classes and abstract use of timm backbone * Add testing, fix output attentions and hidden states, add cache for anchors generation * Fix copies, style, quality * Add documentation, conver key_padding_mask to attention_mask * revert changes to backbone_utils * Fic docstrings rst * Fix unused argument in config * Fix image link documentation * Reorder config and cleanup * Add tokenizer_init_kwargs in merge_kwargs of the processor * Change AutoTokenizer to CLIPTokenizer in convert * Fix init_weights * Add ProcessorMixin tests, Fix convert while waiting on uniform kwargs * change processor kwargs and make task input optional * Fix omdet docs * Remove unnecessary tests for processor kwargs * Replace nested BatchEncoding output of the processor by a flattened BatchFeature * Make modifications from Pavel review * Add changes Amy review * Remove unused param * Remove normalize_before param, Modify processor call docstring * Remove redundant decoder class, add gradient checkpointing for decoder * Remove commented out code * Fix inference in fp16 and add fp16 integrated test * update omdet md doc * Add OmdetTurboModel * fix caching and nit * add OmDetTurboModel to tests * nit change repeated key test * Improve inference speed in eager mode * fix copies * Fix nit * remove OmdetTurboModel * [run-slow] omdet_turbo * [run-slow] omdet_turbo * skip dataparallel test * [run-slow] omdet_turbo * update weights to new path * remove unnecessary config in class --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-91-248.ec2.internal>

yonigozlan requested a review from amyeroberts July 8, 2024 16:02

yonigozlan requested a review from qubvel July 8, 2024 16:33

yonigozlan changed the title ~~Add OmDet-Turbo~~ [WIP] Add OmDet-Turbo Jul 9, 2024

yonigozlan mentioned this pull request Jul 18, 2024

Integrating OmDet Turbo in Transformers 🤗 om-ai-lab/OmDet#13

Open

amyeroberts reviewed Jul 18, 2024

View reviewed changes

qubvel reviewed Jul 19, 2024

View reviewed changes

qubvel self-requested a review July 26, 2024 12:29

qubvel reviewed Jul 26, 2024

View reviewed changes

src/transformers/models/omdet_turbo/modeling_omdet_turbo.py Outdated Show resolved Hide resolved

qubvel reviewed Jul 26, 2024

View reviewed changes

src/transformers/models/omdet_turbo/modeling_omdet_turbo.py Outdated Show resolved Hide resolved

qubvel reviewed Jul 26, 2024

View reviewed changes

src/transformers/models/omdet_turbo/modeling_omdet_turbo.py Show resolved Hide resolved

qubvel reviewed Jul 26, 2024

View reviewed changes

src/transformers/models/omdet_turbo/modeling_omdet_turbo.py Outdated Show resolved Hide resolved

qubvel reviewed Jul 26, 2024

View reviewed changes

src/transformers/models/omdet_turbo/modeling_omdet_turbo.py Outdated Show resolved Hide resolved

qubvel reviewed Jul 26, 2024

View reviewed changes

amyeroberts reviewed Jul 30, 2024

View reviewed changes

yonigozlan force-pushed the add-om-det-turbo branch 2 times, most recently from 9ceb5fc to d52c345 Compare August 5, 2024 19:00

yonigozlan marked this pull request as ready for review August 8, 2024 04:44

yonigozlan requested review from amyeroberts and qubvel August 12, 2024 13:30

yonigozlan force-pushed the add-om-det-turbo branch from c238cb6 to 40eaa8f Compare August 13, 2024 20:53

qubvel reviewed Aug 14, 2024

View reviewed changes

amyeroberts reviewed Aug 20, 2024

View reviewed changes

yonigozlan commented Aug 22, 2024

View reviewed changes

yonigozlan requested review from amyeroberts and qubvel August 28, 2024 05:13

amyeroberts reviewed Sep 5, 2024

View reviewed changes

yonigozlan added 16 commits September 17, 2024 22:06

Replace nested BatchEncoding output of the processor by a flattened B…

e9210a4

…atchFeature

Make modifications from Pavel review

618053e

Add changes Amy review

99c2d3b

Remove unused param

62949f1

Remove normalize_before param, Modify processor call docstring

9447650

Remove redundant decoder class, add gradient checkpointing for decoder

df903ba

Remove commented out code

e89a1ff

Fix inference in fp16 and add fp16 integrated test

e1d5126

update omdet md doc

497e3f9

Add OmdetTurboModel

8a769e9

fix caching and nit

8f31d65

add OmDetTurboModel to tests

dbd908c

nit change repeated key test

1c2d3aa

Improve inference speed in eager mode

5af6160

fix copies

b651def

Fix nit

66ef0b9

yonigozlan force-pushed the add-om-det-turbo branch from 0d9d882 to 66ef0b9 Compare September 18, 2024 04:49

yonigozlan added 2 commits September 20, 2024 20:44

remove OmdetTurboModel

0980201

[run-slow] omdet_turbo

79766ec

yonigozlan added the run-slow label Sep 20, 2024

yonigozlan added 4 commits September 20, 2024 20:54

[run-slow] omdet_turbo

2267ceb

skip dataparallel test

971974f

[run-slow] omdet_turbo

d039fc0

update weights to new path

9ff88d1

amyeroberts approved these changes Sep 24, 2024

View reviewed changes

yonigozlan changed the title ~~[WIP] Add OmDet-Turbo~~ Add OmDet-Turbo Sep 25, 2024

remove unnecessary config in class

94682a1

yonigozlan merged commit 94f18cf into huggingface:main Sep 25, 2024
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OmDet-Turbo #31843

Add OmDet-Turbo #31843

yonigozlan commented Jul 8, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Jul 8, 2024

yonigozlan commented Jul 17, 2024 •

edited

Loading

amyeroberts left a comment

qubvel left a comment

yonigozlan commented Jul 22, 2024

qubvel left a comment

amyeroberts left a comment

amyeroberts Jul 29, 2024

yonigozlan Aug 5, 2024

qubvel left a comment

yonigozlan commented Aug 16, 2024

amyeroberts left a comment

amyeroberts Aug 16, 2024

yonigozlan Aug 22, 2024 •

edited

Loading

qubvel Aug 22, 2024 •

edited

Loading

yonigozlan Aug 22, 2024

amyeroberts left a comment

amyeroberts Sep 5, 2024

amyeroberts left a comment

amyeroberts Sep 24, 2024

	self.output_attentions = config.output_attentions
	self.output_hidden_states = config.output_hidden_states
	self.use_return_dict = config.use_return_dict

Add OmDet-Turbo #31843

Add OmDet-Turbo #31843

Conversation

yonigozlan commented Jul 8, 2024 • edited Loading

What does this PR do?

Who can review?

HuggingFaceDocBuilderDev commented Jul 8, 2024

yonigozlan commented Jul 17, 2024 • edited Loading

amyeroberts left a comment

Choose a reason for hiding this comment

qubvel left a comment

Choose a reason for hiding this comment

yonigozlan commented Jul 22, 2024

qubvel left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Jul 29, 2024

Choose a reason for hiding this comment

yonigozlan Aug 5, 2024

Choose a reason for hiding this comment

qubvel left a comment

Choose a reason for hiding this comment

yonigozlan commented Aug 16, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Aug 16, 2024

Choose a reason for hiding this comment

yonigozlan Aug 22, 2024 • edited Loading

Choose a reason for hiding this comment

qubvel Aug 22, 2024 • edited Loading

Choose a reason for hiding this comment

yonigozlan Aug 22, 2024

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Sep 5, 2024

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Sep 24, 2024

Choose a reason for hiding this comment

yonigozlan commented Jul 8, 2024 •

edited

Loading

yonigozlan commented Jul 17, 2024 •

edited

Loading

yonigozlan Aug 22, 2024 •

edited

Loading

qubvel Aug 22, 2024 •

edited

Loading