Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ViTImageProcessorFast to tests #31424

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions src/transformers/image_processing_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,11 @@ def center_crop(
**kwargs,
)

def to_dict(self):
encoder_dict = super().to_dict()
encoder_dict.pop("_valid_processor_keys", None)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@molbap Would be good to have your confirmation here that we can remove - I don't think we want to save this out in the config: it's private and wouldn't be used when creating a new config class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I agree! no use having it written to the config

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@molbap

Just wondering:

  • why is _valid_processor_keys instance attribute? (could be class attribute?)

return encoder_dict


VALID_SIZE_DICT_KEYS = (
{"height", "width"},
Expand Down
5 changes: 5 additions & 0 deletions src/transformers/image_processing_utils_fast.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,3 +61,8 @@ def _validate_params(self, **kwargs) -> None:
def get_transforms(self, **kwargs) -> "Compose":
self._validate_params(**kwargs)
return self._build_transforms(**kwargs)

def to_dict(self):
encoder_dict = super().to_dict()
encoder_dict.pop("_transform_params", None)
return encoder_dict
11 changes: 6 additions & 5 deletions src/transformers/models/auto/image_processing_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -399,7 +399,7 @@ def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs):
kwargs["token"] = use_auth_token

config = kwargs.pop("config", None)
use_fast = kwargs.pop("use_fast", False)
use_fast = kwargs.pop("use_fast", None)
trust_remote_code = kwargs.pop("trust_remote_code", None)
kwargs["_from_auto"] = True

Expand Down Expand Up @@ -430,10 +430,11 @@ def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs):

if image_processor_class is not None:
# Update class name to reflect the use_fast option. If class is not found, None is returned.
if use_fast and not image_processor_class.endswith("Fast"):
image_processor_class += "Fast"
elif not use_fast and image_processor_class.endswith("Fast"):
image_processor_class = image_processor_class[:-4]
if use_fast is not None:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only update the image_processor_class if use_fast is explicitly set. This makes sure that if a fast image processor is saved out, it will be loaded in as a fast image processor from AutoImageProcessor by default

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got the idea. But it looks like to me here you want to make sure if a slow (rather than slow as you mentioned above) image processor is saved, and use_fast is not specified, we want to load with fast image processor.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I completely understood your comment, but let me try and explain the desired behaviour, and you can let me know if I've mistaken something :)

At the moment, we don't want use_fast to default to either False or True, as this will affect the image processor loaded in regardless of the type of image processor saved out.

Desired behaviours

  • Saved slow image processor -> loads slow image processor by default (use_fast unset)
  • Saved fast image processor -> loads fast image processor by default (use_fast unset)
  • Save slow image processor, use_fast=False -> loads slow image processor
  • Save slow image processor, use_fast=True -> loads fast image processor
  • Save fast image processor, use_fast=False -> loads slow image processor
  • Save fast image processor, use_fast=True -> loads fast image processor

i.e. if a fast image processor is available, but a slow image processor is saved out, we still want to default to the slow image processor for now as they won't produce exactly the same outputs, and instead inform them that a fast version is available. This is to avoid unexpected behaviour

Copy link
Collaborator

@ydshieh ydshieh Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I got it now.

that if a fast image processor is saved out, it will be loaded in as a fast image processor if use_fast is None

which is not the current main but addressed in this PR.!

sorry (not easy to reason about all these cases and motivation)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which is not the current main but addressed in this PR.!

Yep! It was a bug that I hadn't noticed before 😬

if use_fast and not image_processor_class.endswith("Fast"):
image_processor_class += "Fast"
elif not use_fast and image_processor_class.endswith("Fast"):
image_processor_class = image_processor_class[:-4]
image_processor_class = image_processor_class_from_name(image_processor_class)

has_remote_code = image_processor_auto_map is not None
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -772,7 +772,7 @@ def preprocess(
ignore_index,
do_reduce_labels,
return_tensors,
input_data_format=input_data_format,
input_data_format=data_format,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this model (and the ones below with this change) the "input_data_format" is now data_format because the images were changed to data_format in the _preprocess_image step.

This wasn't previously caught because the test images and the defaults were always in channels_last format.

Now, we're feeding in channels_first numpy images for tests, as this is the standard format for numpy images.

)
return encoded_inputs

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -772,7 +772,7 @@ def preprocess(
ignore_index,
do_reduce_labels,
return_tensors,
input_data_format=input_data_format,
input_data_format=data_format,
)
return encoded_inputs

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -772,7 +772,7 @@ def preprocess(
ignore_index,
do_reduce_labels,
return_tensors,
input_data_format=input_data_format,
input_data_format=data_format,
)
return encoded_inputs

Expand Down
3 changes: 1 addition & 2 deletions src/transformers/models/vit/image_processing_vit_fast.py
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,6 @@ def __init__(
self.rescale_factor = rescale_factor
self.image_mean = image_mean if image_mean is not None else IMAGENET_STANDARD_MEAN
self.image_std = image_std if image_std is not None else IMAGENET_STANDARD_STD
self._transform_settings = {}

def _build_transforms(
self,
Expand Down Expand Up @@ -285,5 +284,5 @@ def preprocess(
)
transformed_images = [transforms(image) for image in images]

data = {"pixel_values": torch.vstack(transformed_images)}
data = {"pixel_values": torch.stack(transformed_images, dim=0)}
return BatchFeature(data, tensor_type=return_tensors)
4 changes: 4 additions & 0 deletions tests/models/bridgetower/test_image_processing_bridgetower.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@
import unittest
from typing import Dict, List, Optional, Union

import numpy as np

from transformers.testing_utils import require_torch, require_vision
from transformers.utils import is_vision_available

Expand Down Expand Up @@ -84,6 +86,8 @@ def get_expected_values(self, image_inputs, batched=False):
image = image_inputs[0]
if isinstance(image, Image.Image):
w, h = image.size
elif isinstance(image, np.ndarray):
h, w = image.shape[0], image.shape[1]
else:
h, w = image.shape[1], image.shape[2]
scale = size / min(w, h)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@
import pathlib
import unittest

import numpy as np

from transformers.testing_utils import require_torch, require_vision, slow
from transformers.utils import is_torch_available, is_vision_available

Expand Down Expand Up @@ -87,6 +89,8 @@ def get_expected_values(self, image_inputs, batched=False):
image = image_inputs[0]
if isinstance(image, Image.Image):
w, h = image.size
elif isinstance(image, np.ndarray):
h, w = image.shape[0], image.shape[1]
else:
h, w = image.shape[1], image.shape[2]
if w < h:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@
import pathlib
import unittest

import numpy as np

from transformers.testing_utils import require_torch, require_vision, slow
from transformers.utils import is_torch_available, is_vision_available

Expand Down Expand Up @@ -87,6 +89,8 @@ def get_expected_values(self, image_inputs, batched=False):
image = image_inputs[0]
if isinstance(image, Image.Image):
w, h = image.size
elif isinstance(image, np.ndarray):
h, w = image.shape[0], image.shape[1]
else:
h, w = image.shape[1], image.shape[2]
if w < h:
Expand Down
4 changes: 4 additions & 0 deletions tests/models/detr/test_image_processing_detr.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@
import pathlib
import unittest

import numpy as np

from transformers.testing_utils import require_torch, require_vision, slow
from transformers.utils import is_torch_available, is_vision_available

Expand Down Expand Up @@ -86,6 +88,8 @@ def get_expected_values(self, image_inputs, batched=False):
image = image_inputs[0]
if isinstance(image, Image.Image):
w, h = image.size
elif isinstance(image, np.ndarray):
h, w = image.shape[0], image.shape[1]
else:
h, w = image.shape[1], image.shape[2]
if w < h:
Expand Down
2 changes: 2 additions & 0 deletions tests/models/glpn/test_image_processing_glpn.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,8 @@ def prepare_image_processor_dict(self):
def expected_output_image_shape(self, images):
if isinstance(images[0], Image.Image):
width, height = images[0].size
elif isinstance(images[0], np.ndarray):
height, width = images[0].shape[0], images[0].shape[1]
else:
height, width = images[0].shape[1], images[0].shape[2]

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@
import pathlib
import unittest

import numpy as np

from transformers.testing_utils import require_torch, require_vision, slow
from transformers.utils import is_torch_available, is_vision_available

Expand Down Expand Up @@ -93,6 +95,8 @@ def get_expected_values(self, image_inputs, batched=False):
image = image_inputs[0]
if isinstance(image, Image.Image):
w, h = image.size
elif isinstance(image, np.ndarray):
h, w = image.shape[0], image.shape[1]
else:
h, w = image.shape[1], image.shape[2]
if w < h:
Expand Down
4 changes: 4 additions & 0 deletions tests/models/idefics/test_image_processing_idefics.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@

import unittest

import numpy as np

from transformers.testing_utils import require_torch, require_torchvision, require_vision
from transformers.utils import is_torch_available, is_torchvision_available, is_vision_available

Expand Down Expand Up @@ -75,6 +77,8 @@ def get_expected_values(self, image_inputs, batched=False):
image = image_inputs[0]
if isinstance(image, Image.Image):
w, h = image.size
elif isinstance(image, np.ndarray):
h, w = image.shape[0], image.shape[1]
else:
h, w = image.shape[1], image.shape[2]
scale = size / min(w, h)
Expand Down
158 changes: 99 additions & 59 deletions tests/models/idefics2/test_image_processing_idefics2.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,8 @@ def get_expected_values(self, image_inputs, batched=False):
image = image_inputs[0]
if isinstance(image, Image.Image):
w, h = image.size
elif isinstance(image, np.ndarray):
h, w = image.shape[0], image.shape[1]
else:
h, w = image.shape[1], image.shape[2]

Expand Down Expand Up @@ -176,6 +178,10 @@ def prepare_image_inputs(
if torchify:
images_list = [[torch.from_numpy(image) for image in images] for images in images_list]

if numpify:
# Numpy images are typically in channels last format
images_list = [[image.transpose(1, 2, 0) for image in images] for images in images_list]

return images_list


Expand Down Expand Up @@ -206,66 +212,100 @@ def test_image_processor_properties(self):
self.assertTrue(hasattr(image_processing, "do_image_splitting"))

def test_call_numpy(self):
# Initialize image_processing
image_processing = self.image_processing_class(**self.image_processor_dict)
# create random numpy tensors
image_inputs = self.image_processor_tester.prepare_image_inputs(equal_resolution=False, numpify=True)
for sample_images in image_inputs:
for image in sample_images:
self.assertIsInstance(image, np.ndarray)

# Test not batched input
encoded_images = image_processing(image_inputs[0], return_tensors="pt").pixel_values
expected_output_image_shape = self.image_processor_tester.expected_output_image_shape([image_inputs[0]])
self.assertEqual(tuple(encoded_images.shape), (1, *expected_output_image_shape))

# Test batched
encoded_images = image_processing(image_inputs, return_tensors="pt").pixel_values
expected_output_image_shape = self.image_processor_tester.expected_output_image_shape(image_inputs)
self.assertEqual(
tuple(encoded_images.shape), (self.image_processor_tester.batch_size, *expected_output_image_shape)
)
for image_processing_class in self.image_processor_list:
# Initialize image_processing
image_processing = self.image_processing_class(**self.image_processor_dict)
# create random numpy tensors
image_inputs = self.image_processor_tester.prepare_image_inputs(equal_resolution=False, numpify=True)
for sample_images in image_inputs:
for image in sample_images:
self.assertIsInstance(image, np.ndarray)

# Test not batched input
encoded_images = image_processing(image_inputs[0], return_tensors="pt").pixel_values
expected_output_image_shape = self.image_processor_tester.expected_output_image_shape([image_inputs[0]])
self.assertEqual(tuple(encoded_images.shape), (1, *expected_output_image_shape))

# Test batched
encoded_images = image_processing(image_inputs, return_tensors="pt").pixel_values
expected_output_image_shape = self.image_processor_tester.expected_output_image_shape(image_inputs)
self.assertEqual(
tuple(encoded_images.shape), (self.image_processor_tester.batch_size, *expected_output_image_shape)
)

def test_call_numpy_4_channels(self):
for image_processing_class in self.image_processor_list:
# Initialize image_processing
image_processor_dict = self.image_processor_dict
image_processor_dict["image_mean"] = [0.5, 0.5, 0.5, 0.5]
image_processor_dict["image_std"] = [0.5, 0.5, 0.5, 0.5]
image_processing = self.image_processing_class(**image_processor_dict)
# create random numpy tensors
self.image_processor_tester.num_channels = 4
image_inputs = self.image_processor_tester.prepare_image_inputs(equal_resolution=False, numpify=True)

for sample_images in image_inputs:
for image in sample_images:
self.assertIsInstance(image, np.ndarray)

# Test not batched input
encoded_images = image_processing(
image_inputs[0], input_data_format="channels_last", return_tensors="pt"
).pixel_values
expected_output_image_shape = self.image_processor_tester.expected_output_image_shape([image_inputs[0]])
self.assertEqual(tuple(encoded_images.shape), (1, *expected_output_image_shape))

# Test batched
encoded_images = image_processing(
image_inputs, input_data_format="channels_last", return_tensors="pt"
).pixel_values
expected_output_image_shape = self.image_processor_tester.expected_output_image_shape(image_inputs)
self.assertEqual(
tuple(encoded_images.shape), (self.image_processor_tester.batch_size, *expected_output_image_shape)
)

def test_call_pil(self):
# Initialize image_processing
image_processing = self.image_processing_class(**self.image_processor_dict)
# create random PIL images
image_inputs = self.image_processor_tester.prepare_image_inputs(equal_resolution=False)
for images in image_inputs:
for image in images:
self.assertIsInstance(image, Image.Image)

# Test not batched input
encoded_images = image_processing(image_inputs[0], return_tensors="pt").pixel_values
expected_output_image_shape = self.image_processor_tester.expected_output_image_shape([image_inputs[0]])
self.assertEqual(tuple(encoded_images.shape), (1, *expected_output_image_shape))

# Test batched
encoded_images = image_processing(image_inputs, return_tensors="pt").pixel_values
expected_output_image_shape = self.image_processor_tester.expected_output_image_shape(image_inputs)
self.assertEqual(
tuple(encoded_images.shape), (self.image_processor_tester.batch_size, *expected_output_image_shape)
)
for image_processing_class in self.image_processor_list:
# Initialize image_processing
image_processing = self.image_processing_class(**self.image_processor_dict)
# create random PIL images
image_inputs = self.image_processor_tester.prepare_image_inputs(equal_resolution=False)
for images in image_inputs:
for image in images:
self.assertIsInstance(image, Image.Image)

# Test not batched input
encoded_images = image_processing(image_inputs[0], return_tensors="pt").pixel_values
expected_output_image_shape = self.image_processor_tester.expected_output_image_shape([image_inputs[0]])
self.assertEqual(tuple(encoded_images.shape), (1, *expected_output_image_shape))

# Test batched
encoded_images = image_processing(image_inputs, return_tensors="pt").pixel_values
expected_output_image_shape = self.image_processor_tester.expected_output_image_shape(image_inputs)
self.assertEqual(
tuple(encoded_images.shape), (self.image_processor_tester.batch_size, *expected_output_image_shape)
)

def test_call_pytorch(self):
# Initialize image_processing
image_processing = self.image_processing_class(**self.image_processor_dict)
# create random PyTorch tensors
image_inputs = self.image_processor_tester.prepare_image_inputs(equal_resolution=False, torchify=True)

for images in image_inputs:
for image in images:
self.assertIsInstance(image, torch.Tensor)

# Test not batched input
encoded_images = image_processing(image_inputs[0], return_tensors="pt").pixel_values
expected_output_image_shape = self.image_processor_tester.expected_output_image_shape([image_inputs[0]])
self.assertEqual(tuple(encoded_images.shape), (1, *expected_output_image_shape))

# Test batched
expected_output_image_shape = self.image_processor_tester.expected_output_image_shape(image_inputs)
encoded_images = image_processing(image_inputs, return_tensors="pt").pixel_values
self.assertEqual(
tuple(encoded_images.shape),
(self.image_processor_tester.batch_size, *expected_output_image_shape),
)
for image_processing_class in self.image_processor_list:
# Initialize image_processing
image_processing = self.image_processing_class(**self.image_processor_dict)
# create random PyTorch tensors
image_inputs = self.image_processor_tester.prepare_image_inputs(equal_resolution=False, torchify=True)

for images in image_inputs:
for image in images:
self.assertIsInstance(image, torch.Tensor)

# Test not batched input
encoded_images = image_processing(image_inputs[0], return_tensors="pt").pixel_values
expected_output_image_shape = self.image_processor_tester.expected_output_image_shape([image_inputs[0]])
self.assertEqual(tuple(encoded_images.shape), (1, *expected_output_image_shape))

# Test batched
expected_output_image_shape = self.image_processor_tester.expected_output_image_shape(image_inputs)
encoded_images = image_processing(image_inputs, return_tensors="pt").pixel_values
self.assertEqual(
tuple(encoded_images.shape),
(self.image_processor_tester.batch_size, *expected_output_image_shape),
)
2 changes: 2 additions & 0 deletions tests/models/mask2former/test_image_processing_mask2former.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,8 @@ def get_expected_values(self, image_inputs, batched=False):
image = image_inputs[0]
if isinstance(image, Image.Image):
w, h = image.size
elif isinstance(image, np.ndarray):
h, w = image.shape[0], image.shape[1]
else:
h, w = image.shape[1], image.shape[2]
if w < h:
Expand Down
Loading
Loading