Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ViTPose #30530

Merged
merged 201 commits into from
Jan 8, 2025
Merged
Changes from 1 commit
Commits
Show all changes
201 commits
Select commit Hold shift + click to select a range
03e4321
First draft
NielsRogge May 25, 2022
84ac7fe
Make fixup
May 26, 2022
90018b0
Make forward pass worké
May 27, 2022
5ce0b8b
Improve code
Jun 2, 2022
3009a8a
Fix merge
NielsRogge Apr 15, 2024
8f39773
More improvements
NielsRogge Apr 15, 2024
067f593
More improvements
NielsRogge Apr 15, 2024
7360c22
Make predictions match
NielsRogge Apr 15, 2024
a1b154a
More improvements
NielsRogge Apr 15, 2024
4bd07c3
Improve image processor
NielsRogge Apr 15, 2024
44f694a
Fix model tests
NielsRogge Apr 16, 2024
41c1778
Add classic decoder
NielsRogge Apr 16, 2024
1773f8d
Merge remote-tracking branch 'upstream/main' into add_vitpose
NielsRogge Apr 21, 2024
ceb3d3c
Convert classic decoder
NielsRogge Apr 21, 2024
fedf2cc
Verify image processor
NielsRogge Apr 21, 2024
38dedcd
Fix classic decoder logits
NielsRogge Apr 21, 2024
4cdbc03
Clean up
NielsRogge Apr 22, 2024
95aae6d
Add post_process_pose_estimation
NielsRogge Apr 22, 2024
2531c19
Improve post_process_pose_estimation
NielsRogge Apr 22, 2024
e06d678
Use AutoBackbone
NielsRogge Apr 22, 2024
c4a7df1
Add support for MoE models
NielsRogge Apr 22, 2024
b09592c
Fix tests, improve num_experts%
NielsRogge Apr 22, 2024
04930ec
Improve variable names
NielsRogge Apr 22, 2024
3432448
Fix merge
NielsRogge Apr 28, 2024
676aa5c
Make fixup
NielsRogge Apr 28, 2024
547d0da
More improvements
NielsRogge Apr 28, 2024
4435fd6
Improve post_process_pose_estimation
NielsRogge Apr 28, 2024
db0e72b
Compute centers and scales
NielsRogge Apr 28, 2024
027100d
Improve postprocessing
NielsRogge Apr 28, 2024
fc8e5e0
More improvements
NielsRogge Apr 28, 2024
6e7afac
Fix ViTPoseBackbone tests
NielsRogge Apr 28, 2024
8888c15
Add docstrings, fix image processor tests
NielsRogge Apr 28, 2024
0290de5
Update index
NielsRogge Apr 28, 2024
d33cb01
Use is_cv2_available
NielsRogge Apr 29, 2024
6af97a9
Add model to toctree
NielsRogge Apr 29, 2024
c4ccdb6
Add cv2 to doc tests
NielsRogge Apr 29, 2024
3eb3865
Fix merge
NielsRogge May 6, 2024
e09aa53
Remove script
NielsRogge May 6, 2024
2203538
Improve conversion script
NielsRogge May 6, 2024
ee5f191
Add coco_to_pascal_voc
NielsRogge May 6, 2024
dcd4401
Add box_to_center_and_scale to image_transforms
NielsRogge May 6, 2024
97a0e09
Update tests
NielsRogge May 8, 2024
d579009
Add integration test
NielsRogge May 11, 2024
4cfa299
Fix merge
NielsRogge May 13, 2024
9b8b4d1
Fix merge
NielsRogge May 13, 2024
13ee55f
Address comments
NielsRogge May 22, 2024
3b22ef8
Replace numpy by pytorch, improve docstrings
NielsRogge May 22, 2024
4873d38
Remove get_input_embeddings
NielsRogge May 22, 2024
b32c1aa
Address comments
NielsRogge May 24, 2024
1a16aa6
Move coco_to_pascal_voc
NielsRogge May 24, 2024
b84f23c
Address comment
NielsRogge May 24, 2024
20c44b9
Fix style
NielsRogge May 27, 2024
7aedeff
Address comments
NielsRogge May 27, 2024
65ee995
Fix test
NielsRogge May 27, 2024
f75119a
Address comment
NielsRogge May 27, 2024
d761e81
Merge remote-tracking branch 'upstream/main' into add_vitpose_autobac…
NielsRogge Jun 3, 2024
8588a0c
Remove udp
NielsRogge Jun 3, 2024
6238277
Remove comment
NielsRogge Jun 3, 2024
3c3aa67
[WIP] need to check if the numpy function is same as cv
SangbumChoi Jul 11, 2024
97961ee
add scipy affine_transform
SangbumChoi Jul 16, 2024
c397384
Update src/transformers/models/vitpose/image_processing_vitpose.py
SangbumChoi Jul 16, 2024
64ff8de
refactor convert
SangbumChoi Jul 16, 2024
6fed41b
Merge branch 'vitpose' of https://github.com/SangbumChoi/transformers…
SangbumChoi Jul 16, 2024
daad34a
add output_shape
SangbumChoi Jul 17, 2024
b0a488e
add atol 5e-2
SangbumChoi Jul 17, 2024
c20462e
Merge pull request #57 from SangbumChoi/vitpose
NielsRogge Jul 23, 2024
be6955a
Use hf_hub_download in conversion script
NielsRogge Jul 23, 2024
3d1824d
Fix merge
NielsRogge Jul 23, 2024
1b439e2
make box_to_center more applicable
SangbumChoi Aug 2, 2024
e576c8e
skipt test_get_set_embedding
SangbumChoi Aug 2, 2024
a55a955
fix to accept array and fix CI
SangbumChoi Aug 2, 2024
e621b80
add co-contributor
SangbumChoi Aug 2, 2024
9636f5a
Merge branch 'huggingface:main' into add_vitpose_autobackbone
SangbumChoi Aug 3, 2024
255ddf5
make it to tensor type output
SangbumChoi Aug 3, 2024
f0f9d61
add torch
SangbumChoi Aug 3, 2024
e38b207
change to torch tensor
SangbumChoi Aug 3, 2024
bbd534c
add more test
SangbumChoi Aug 3, 2024
dcf1485
minor change
SangbumChoi Aug 3, 2024
d10fb30
CI test change
SangbumChoi Aug 3, 2024
741f07b
import torch should be above ImageProcessor
SangbumChoi Aug 3, 2024
2a3d792
make style
SangbumChoi Aug 3, 2024
9cdbf5f
try not use torch in def
SangbumChoi Aug 3, 2024
7f7e9ec
Update src/transformers/models/vitpose/image_processing_vitpose.py
SangbumChoi Aug 12, 2024
de22f65
Update src/transformers/models/vitpose_backbone/configuration_vitpose…
SangbumChoi Aug 12, 2024
cf07432
Update src/transformers/models/vitpose_backbone/modeling_vitpose_back…
SangbumChoi Aug 12, 2024
2aef46b
Update src/transformers/models/vitpose/modeling_vitpose.py
SangbumChoi Aug 13, 2024
3cc8d2a
fix
SangbumChoi Aug 13, 2024
dafadf7
Merge branch 'add_vitpose_autobackbone' of https://github.com/NielsRo…
SangbumChoi Aug 13, 2024
5bdd62e
fix
SangbumChoi Aug 13, 2024
2f40861
add caution
SangbumChoi Aug 19, 2024
5e8b89e
make more detail about dataset_index
SangbumChoi Aug 19, 2024
c19c97a
Update src/transformers/models/vitpose/modeling_vitpose.py
NielsRogge Aug 20, 2024
f064009
Update src/transformers/models/vitpose/image_processing_vitpose.py
NielsRogge Aug 20, 2024
e9c6b1e
add docs
SangbumChoi Aug 20, 2024
80e0545
Update docs/source/en/model_doc/vitpose.md
SangbumChoi Aug 29, 2024
533d298
Update src/transformers/models/vitpose/configuration_vitpose.py
SangbumChoi Sep 3, 2024
7ffa504
Update src/transformers/__init__.py
SangbumChoi Sep 3, 2024
68da46a
Revert "Update src/transformers/__init__.py"
SangbumChoi Sep 3, 2024
72f8fcb
change name
SangbumChoi Sep 3, 2024
7d82ba6
Update src/transformers/models/vitpose/image_processing_vitpose.py
SangbumChoi Sep 3, 2024
26ee67f
Update tests/models/vitpose/test_modeling_vitpose.py
SangbumChoi Sep 3, 2024
50d65ea
Update docs/source/en/model_doc/vitpose.md
SangbumChoi Sep 3, 2024
2927334
Update src/transformers/models/vitpose/modeling_vitpose.py
SangbumChoi Sep 3, 2024
48ce1b4
Update src/transformers/models/vitpose_backbone/modeling_vitpose_back…
SangbumChoi Sep 3, 2024
8132991
Update src/transformers/models/vitpose/image_processing_vitpose.py
SangbumChoi Sep 4, 2024
eeb4a6f
move vitpose only function to image_processor
SangbumChoi Sep 4, 2024
c1172a3
raise valueerror when using timm backbone
SangbumChoi Sep 4, 2024
16b1903
use out_indices
SangbumChoi Sep 4, 2024
b446563
Update src/transformers/models/vitpose/image_processing_vitpose.py
SangbumChoi Sep 4, 2024
97ffaa7
remove camel-case of def flip_back
SangbumChoi Sep 4, 2024
82934fa
rename vitposeEstimatorOutput
SangbumChoi Sep 4, 2024
f839073
Update src/transformers/models/vitpose_backbone/modeling_vitpose_back…
SangbumChoi Sep 4, 2024
f1cbfd0
Merge branch 'add_vitpose_autobackbone' of https://github.com/NielsRo…
SangbumChoi Sep 4, 2024
d81e3f8
fix confused camelcase of MLP
SangbumChoi Sep 4, 2024
0e40dc7
remove in-place logic
SangbumChoi Sep 4, 2024
33a0040
clear scale description
SangbumChoi Sep 4, 2024
8b9d9f7
make consistent batch format
SangbumChoi Sep 4, 2024
20df85b
docs update
SangbumChoi Sep 4, 2024
2f3d6df
formatting docstring
SangbumChoi Sep 4, 2024
3b04bc7
add batch tests
SangbumChoi Sep 4, 2024
c880093
test docs change
SangbumChoi Sep 5, 2024
1b513ca
Merge branch 'huggingface:main' into add_vitpose_autobackbone
SangbumChoi Sep 5, 2024
cbbb966
Update src/transformers/models/vitpose/image_processing_vitpose.py
SangbumChoi Sep 6, 2024
f0e0f79
Update src/transformers/models/vitpose/configuration_vitpose.py
SangbumChoi Sep 6, 2024
50294f6
chagne ViT to Vit
SangbumChoi Sep 6, 2024
5911010
change to enable MoE
SangbumChoi Sep 6, 2024
cb6d45f
make fix-copies
SangbumChoi Sep 6, 2024
5197549
Update docs/source/en/model_doc/vitpose.md
SangbumChoi Sep 10, 2024
22fc705
extract udp
SangbumChoi Sep 10, 2024
0e5549f
add more described docs
SangbumChoi Sep 10, 2024
12a7b8c
simple fix
SangbumChoi Sep 10, 2024
220859d
change to accept target_size
SangbumChoi Sep 11, 2024
1afd347
make style
SangbumChoi Sep 11, 2024
f9ae524
Update src/transformers/models/vitpose/image_processing_vitpose.py
SangbumChoi Sep 23, 2024
60fb1c1
Update src/transformers/models/vitpose/configuration_vitpose.py
SangbumChoi Sep 23, 2024
b922d7e
change to `verify_backbone_config_arguments`
SangbumChoi Sep 23, 2024
6431ec4
Update docs/source/en/model_doc/vitpose.md
SangbumChoi Sep 23, 2024
7a06e38
remove unnecessary copy
SangbumChoi Sep 23, 2024
edc5320
make config immutable
SangbumChoi Sep 23, 2024
01a532b
enable gradient checkpointing
SangbumChoi Sep 23, 2024
1cbba25
update inappropriate docstring
SangbumChoi Sep 23, 2024
2ac4c67
linting docs
SangbumChoi Sep 23, 2024
eca096d
split function for visibility
SangbumChoi Sep 23, 2024
a047145
make style
SangbumChoi Sep 23, 2024
080960c
Merge branch 'add_vitpose_autobackbone' of https://github.com/NielsRo…
SangbumChoi Sep 23, 2024
4dd3aab
check isinstances
SangbumChoi Sep 23, 2024
d981714
change to acceptable use_pretrained_backbone
SangbumChoi Sep 23, 2024
2ad9ded
make style
SangbumChoi Sep 23, 2024
fe49a84
remove copy in docs
SangbumChoi Sep 24, 2024
bc4ae9a
Update src/transformers/models/vitpose_backbone/modeling_vitpose_back…
SangbumChoi Oct 6, 2024
fee2582
Update docs/source/en/model_doc/vitpose.md
SangbumChoi Oct 6, 2024
8cb5f9c
Update src/transformers/models/vitpose/modeling_vitpose.py
SangbumChoi Oct 6, 2024
42be42b
simple fix + make style
SangbumChoi Oct 6, 2024
f835be5
change input config of activation function to string
SangbumChoi Oct 6, 2024
b6699c9
Update docs/source/en/model_doc/vitpose.md
SangbumChoi Oct 6, 2024
ebdf2df
tmp docs
SangbumChoi Oct 6, 2024
0c45aef
Merge branch 'add_vitpose_autobackbone' of https://github.com/NielsRo…
SangbumChoi Oct 19, 2024
84ae81d
Merge branch 'main' of https://github.com/SangbumChoi/transformers in…
SangbumChoi Oct 19, 2024
9562956
delete index.md
SangbumChoi Oct 19, 2024
bb5cc96
make fix-copies
SangbumChoi Oct 19, 2024
899cb96
simple fix
SangbumChoi Oct 19, 2024
29e8c3e
Merge branch 'main' into add_vitpose_autobackbone
SangbumChoi Nov 3, 2024
9eb2e64
change conversion to sam2/mllama style
SangbumChoi Nov 3, 2024
8738973
Update src/transformers/models/vitpose/image_processing_vitpose.py
SangbumChoi Nov 6, 2024
75b268f
Update src/transformers/models/vitpose/image_processing_vitpose.py
SangbumChoi Nov 6, 2024
5a1f6a3
refactor convert
SangbumChoi Nov 9, 2024
3bfc219
add supervision
SangbumChoi Nov 9, 2024
dae04e8
Update src/transformers/models/vitpose_backbone/modeling_vitpose_back…
SangbumChoi Nov 16, 2024
0747e80
remove reduntant def
SangbumChoi Nov 18, 2024
5207b57
seperate code block for visualization
SangbumChoi Nov 18, 2024
96b4da9
add validation for num_moe
SangbumChoi Nov 18, 2024
d4aa3ee
final commit
SangbumChoi Nov 20, 2024
a9a3645
add labels
SangbumChoi Nov 20, 2024
d8e6e2e
[run-slow] vitpose, vitpose_backbone
SangbumChoi Nov 20, 2024
e588f4f
Update src/transformers/models/vitpose/convert_vitpose_to_hf.py
SangbumChoi Nov 20, 2024
78fe1b9
enable all conversion
SangbumChoi Nov 20, 2024
ac3712f
Merge branch 'add_vitpose_autobackbone' of https://github.com/NielsRo…
SangbumChoi Nov 20, 2024
831f70d
final commit
SangbumChoi Nov 20, 2024
ac00401
[run-slow] vitpose, vitpose_backbone
SangbumChoi Nov 20, 2024
241211d
ruff check --fix
SangbumChoi Nov 20, 2024
5610d5d
[run-slow] vitpose, vitpose_backbone
SangbumChoi Nov 20, 2024
3d4d559
Merge branch 'main' into add_vitpose_autobackbone
SangbumChoi Nov 21, 2024
28623b1
rename split module
SangbumChoi Nov 21, 2024
e86751a
[run-slow] vitpose, vitpose_backbone
SangbumChoi Nov 21, 2024
2c56a48
fix pos_embed
SangbumChoi Dec 16, 2024
c12eb35
Merge branch 'add_vitpose_autobackbone' of https://github.com/NielsRo…
SangbumChoi Dec 16, 2024
ba7373f
Simplify init
NielsRogge Dec 16, 2024
e2fbb26
Revert "fix pos_embed"
SangbumChoi Dec 17, 2024
a9bb08f
refactor single loop
SangbumChoi Dec 24, 2024
0dd9613
allow flag to enable custom model
SangbumChoi Dec 24, 2024
b21bb06
efficiency of MoE to not use unused experts
SangbumChoi Dec 24, 2024
9a5c86d
make style
SangbumChoi Dec 24, 2024
a5e7966
Fix range -> arange to avoid warning
qubvel Jan 8, 2025
fa6e613
Revert MOE router, a new one does not work
qubvel Jan 8, 2025
f2037ce
Fix postprocessing a bit (labels)
qubvel Jan 8, 2025
3cbd9e3
Fix type hint
qubvel Jan 8, 2025
fdd080c
Fix docs snippets
qubvel Jan 8, 2025
3cb154c
Fix links to checkpoints
qubvel Jan 8, 2025
8a4d9c1
Fix checkpoints in tests
qubvel Jan 8, 2025
09752cf
Fix test
qubvel Jan 8, 2025
1bea6c1
Add image to docs
qubvel Jan 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Simplify init
  • Loading branch information
NielsRogge committed Dec 16, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
commit ba7373f96fd35c58355705822ee36f605a4cee41
53 changes: 7 additions & 46 deletions src/transformers/models/vitpose/__init__.py
Original file line number Diff line number Diff line change
@@ -13,55 +13,16 @@
# limitations under the License.
from typing import TYPE_CHECKING

from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available, is_vision_available
from ...utils import _LazyModule
from ...utils.import_utils import define_import_structure


_import_structure = {"configuration_vitpose": ["VitPoseConfig"]}


try:
if not is_vision_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
_import_structure["image_processing_vitpose"] = ["VitPoseImageProcessor"]


try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
_import_structure["modeling_vitpose"] = [
"VitPosePreTrainedModel",
"VitPoseForPoseEstimation",
]

if TYPE_CHECKING:
from .configuration_vitpose import VitPoseConfig

try:
if not is_vision_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
from .image_processing_vitpose import VitPoseImageProcessor

try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
from .modeling_vitpose import (
VitPoseForPoseEstimation,
VitPosePreTrainedModel,
)

from .configuration_vitpose import *
from .image_processing_vitpose import *
from .modeling_vitpose import *
else:
import sys

sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure, module_spec=__spec__)
_file = globals()["__file__"]
sys.modules[__name__] = _LazyModule(__name__, _file, define_import_structure(_file), module_spec=__spec__)
3 changes: 3 additions & 0 deletions src/transformers/models/vitpose/configuration_vitpose.py
Original file line number Diff line number Diff line change
@@ -119,3 +119,6 @@ def __init__(
self.initializer_range = initializer_range
self.scale_factor = scale_factor
self.use_simple_decoder = use_simple_decoder


__all__ = ["VitPoseConfig"]
3 changes: 3 additions & 0 deletions src/transformers/models/vitpose/image_processing_vitpose.py
Original file line number Diff line number Diff line change
@@ -671,3 +671,6 @@ def post_process_pose_estimation(
results.append(batch_results)

return results


__all__ = ["VitPoseImageProcessor"]
3 changes: 3 additions & 0 deletions src/transformers/models/vitpose/modeling_vitpose.py
Original file line number Diff line number Diff line change
@@ -334,3 +334,6 @@ def forward(
hidden_states=outputs.hidden_states,
attentions=outputs.attentions,
)


__all__ = ["VitPosePreTrainedModel", "VitPoseForPoseEstimation"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks alright, but we should be using modular transformers. This will make it easier when we refactor attention or whatnot !

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, you know better than me the other vision models, but let's push standardization as much as we can for thiings that can be translated to another thing!

Original file line number Diff line number Diff line change
@@ -91,9 +91,7 @@ def __init__(self, config: VitPoseBackboneConfig) -> None:
num_patches = self.patch_embeddings.num_patches
position_embeddings = torch.zeros(1, num_patches + 1, config.hidden_size)
# Pre-compute the modified position embeddings
self.position_embeddings = nn.Parameter(
position_embeddings[:, 1:] + position_embeddings[:, :1]
)
self.position_embeddings = nn.Parameter(position_embeddings[:, 1:] + position_embeddings[:, :1])
self.dropout = nn.Dropout(config.hidden_dropout_prob)

def forward(self, pixel_values: torch.Tensor) -> torch.Tensor:
Loading