Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DINOv2 with registers #35348

Merged

Conversation

NielsRogge
Copy link
Contributor

@NielsRogge NielsRogge commented Dec 19, 2024

What does this PR do?

This PR adds DINOv2 with registers, this time using the new modular tool.

Fixes #27379

To do:

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! The drop path update might be a bit breaking, you can import the function in modular IMO

src/transformers/models/dinov2_with_registers/__init__.py Outdated Show resolved Hide resolved
@NielsRogge
Copy link
Contributor Author

NielsRogge commented Dec 22, 2024

Thanks for review, revert the drop path update cause it wasn't need. Looks like we can merge :)

One thing left: All model checkpoints are on the hub, would need to be transferred to the facebook org: https://huggingface.co/models?other=dinov2_with_registers (and then replace nielsr by facebook everywhere)

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep feel free to merge in the meant time 🤗 thanks for updating !

@NielsRogge NielsRogge merged commit 6e0515e into huggingface:main Dec 24, 2024
25 checks passed
@xenova
Copy link
Contributor

xenova commented Dec 24, 2024

@NielsRogge Would it be possible to adapt the interpolate_pos_encoding function to use the same approach as Dinov2 (but just with options for antialiasing):

def interpolate_pos_encoding(self, embeddings: torch.Tensor, height: int, width: int) -> torch.Tensor:
"""
This method allows to interpolate the pre-trained position encodings, to be able to use the model on higher resolution
images. This method is also adapted to support torch.jit tracing and interpolation at torch.float32 precision.
Adapted from:
- https://github.com/facebookresearch/dino/blob/de9ee3df6cf39fac952ab558447af1fa1365362a/vision_transformer.py#L174-L194, and
- https://github.com/facebookresearch/dinov2/blob/e1277af2ba9496fbadf7aec6eba56e8d882d1e35/dinov2/models/vision_transformer.py#L179-L211
"""
num_patches = embeddings.shape[1] - 1
num_positions = self.position_embeddings.shape[1] - 1
# always interpolate when tracing to ensure the exported model works for dynamic input shapes
if not torch.jit.is_tracing() and num_patches == num_positions and height == width:
return self.position_embeddings
class_pos_embed = self.position_embeddings[:, :1]
patch_pos_embed = self.position_embeddings[:, 1:]
dim = embeddings.shape[-1]
new_height = height // self.patch_size
new_width = width // self.patch_size
sqrt_num_positions = torch_int(num_positions**0.5)
patch_pos_embed = patch_pos_embed.reshape(1, sqrt_num_positions, sqrt_num_positions, dim)
patch_pos_embed = patch_pos_embed.permute(0, 3, 1, 2)
target_dtype = patch_pos_embed.dtype
patch_pos_embed = nn.functional.interpolate(
patch_pos_embed.to(torch.float32),
size=(new_height, new_width),
mode="bicubic",
align_corners=False,
).to(dtype=target_dtype)
patch_pos_embed = patch_pos_embed.permute(0, 2, 3, 1).view(1, -1, dim)

This current version has issues with tracing of the graph (see #33226).

>>> configuration = model.config
```"""

model_type = "dinov2-with-registers-base"
Copy link
Contributor

@xenova xenova Dec 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a mismatch between the models on the hub and the model_type specified here. See https://huggingface.co/facebook/dinov2-with-registers-small/blob/main/config.json#L18, it is dinov2_with_registers.

This causes an issue when loading the models and re-saving them (e.g., for finetuning or conversions)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @ydshieh weird this wasn't caught by the tests, is this something we can add a test for?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dinov2 with REGISTERS
6 participants