Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🔴 🔴 🔴 Added segmentation maps support for DPT image processor #34345

Conversation

simonreise
Copy link
Contributor

@simonreise simonreise commented Oct 23, 2024

Added segmentation maps support for DPT image processor

Most of image processors for vision models that support semantic segmentation task accept images and segmentation_maps as inputs, but for some reason DPT image processor does not process segmentation maps, only images. This PR can make code that one uses for training or evaluation of semantic segmentation models more reusable, as now DPT image processor can process segmentation maps as most of other image processors do.

I also added do_reduce_labels arg because other image processors that support segmentation masks use it.

I added two new tests: one that tests segmentation_masks support and one that tests if do_reduce_labels work as expected.

Most of the code is adapted from BEIT image processor.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@amyeroberts, @qubvel

@LysandreJik
Copy link
Member

cc @molbap as well in case bandwidth permits

Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - just a small refactor of the method to be more aligned with existing models!


def test_call_segmentation_maps(self):
# Initialize image_processing
image_processing = self.image_processing_class(**self.image_processor_dict)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, image_processor would be better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed image_processing to image_processor. Should I also rename it in the other tests?

Comment on lines 462 to 504
if segmentation_maps is not None:
segmentation_maps = [to_numpy_array(segmentation_map) for segmentation_map in segmentation_maps]

# Add channel dimension if missing - needed for certain transformations
if segmentation_maps[0].ndim == 2:
added_channel_dim = True
segmentation_maps = [segmentation_map[None, ...] for segmentation_map in segmentation_maps]
input_data_format = ChannelDimension.FIRST
else:
added_channel_dim = False
if input_data_format is None:
input_data_format = infer_channel_dimension_format(segmentation_maps[0], num_channels=1)

if do_reduce_labels:
segmentation_maps = [self.reduce_label(segmentation_map) for segmentation_map in segmentation_maps]

if do_resize:
segmentation_maps = [
self.resize(
image=segmentation_map,
size=size,
resample=resample,
keep_aspect_ratio=keep_aspect_ratio,
ensure_multiple_of=ensure_multiple_of,
input_data_format=input_data_format,
)
for segmentation_map in segmentation_maps
]

if do_pad:
segmentation_maps = [
self.pad_image(
image=segmentation_map, size_divisor=size_divisor, input_data_format=input_data_format
)
for segmentation_map in segmentation_maps
]

# Remove extra channel dimension if added for processing
if added_channel_dim:
segmentation_maps = [segmentation_map.squeeze(0) for segmentation_map in segmentation_maps]
segmentation_maps = [segmentation_map.astype(np.int64) for segmentation_map in segmentation_maps]

data["labels"] = segmentation_maps
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect - if there isn't any difference with Beit, can this be wrapped in a _preprocess_segmentation_map() method in a loop, that can be flagged as # Copied from ... the beit image processor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapped segmentation map preprocessing code to _preprocess_segmentation_map(), and also moved image preprocessing to separate _preprocess_image() function and general preprocessing functionality to _preprocess() function.

@simonreise simonreise requested a review from molbap November 11, 2024 15:16
@simonreise
Copy link
Contributor Author

Could you please re-review the pull request? In the last commit I made all the changes you asked for: wrapped segmentation map preprocessing code to separate functions, added comments and renamed a variable in tests. Do I need to make any other changes to the code?

@molbap
Copy link
Contributor

molbap commented Nov 18, 2024

hey @simonreise , will review in a moment, we were all at a team gathering last week hence the inactivity. On my radar!

@ArthurZucker ArthurZucker requested review from yonigozlan and removed request for molbap November 19, 2024 11:42
@ArthurZucker
Copy link
Collaborator

@molbap you are fobidden to work for this week 🤣 go and rest, @yonigozlan will have a look! 🤗

@qubvel
Copy link
Member

qubvel commented Dec 17, 2024

friendly ping @yonigozlan

Copy link
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!
Looks good to me, and something that would be useful to have.
Just left some comments about adding # Copied from statements where needed, and about breaking backward compatibility (mainly intended for a core maintainer)

Comment on lines +285 to +291
def reduce_label(self, label: ImageInput) -> np.ndarray:
label = to_numpy_array(label)
# Avoid using underflow conversion
label[label == 0] = 255
label = label - 1
label[label == 254] = 255
return label
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be fully copied from beit image processor, you should add a # Copied from statement above if that's the case :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +431 to +434
def __call__(self, images, segmentation_maps=None, **kwargs):
# Overrides the `__call__` method of the `Preprocessor` class such that the images and segmentation maps can both
# be passed in as positional arguments.
return super().__call__(images, segmentation_maps=segmentation_maps, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here for adding a # Copied from, and same for all the other methods copied from beit as well.

@filter_out_non_signature_kwargs()
def preprocess(
self,
images: ImageInput,
segmentation_maps: Optional[ImageInput] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit tricky as it could be a breaking change, if some users use do_resize etc. as args and not kwargs. However this would not be good practice, and I don't see any way of adding segmentation_maps processing without breaking BC. I'll let a core maintainer give the green light on this or not.

Comment on lines 171 to 172

def test_call_segmentation_maps(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before about the # Copied from, if this and the following tests are indeed fully copied from beit

@yonigozlan
Copy link
Member

Thanks, but the # Copied from statement must be placed above the function definition. You can refer to other parts of the library to see how it's done.
This is not just for information purposes; it enables the make fix-copies CLI command to propagate any modifications in the original function to its copied versions.

After making the required changes, you can ensure everything is in order by running the make fixup command.

@yonigozlan
Copy link
Member

Thanks for iterating! you just have to rebase on main and check that the tests are still passing, then LGTM!

@yonigozlan
Copy link
Member

Everything looks good to me, pinging @ArthurZucker for a final review.
There's a potential breaking change @ArthurZucker, see this comment

@qubvel qubvel requested a review from ArthurZucker January 7, 2025 11:04
@simonreise
Copy link
Contributor Author

There was a merge conflict that appeared after #35439 was merged into main. So I also changed the order of do_rescale and is_scaled_image in the code

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't mind breaking this, but we need to add a 🔴 on the PR name to sort it out when releasing! 🤗

@ArthurZucker ArthurZucker changed the title Added segmentation maps support for DPT image processor 🔴 🔴 🔴 Added segmentation maps support for DPT image processor Jan 27, 2025
@ArthurZucker ArthurZucker merged commit 5450e7c into huggingface:main Jan 27, 2025
9 checks passed
@simonreise simonreise deleted the segmentation-maps-for-dpt-image-processor branch January 30, 2025 14:00
bursteratom pushed a commit to bursteratom/transformers that referenced this pull request Jan 31, 2025
…ingface#34345)

* Added `segmentation_maps` support for DPT image processor

* Added tests for dpt image processor

* Moved preprocessing into separate functions

* Added # Copied from statements

* Fixed # Copied from statements

* Added `segmentation_maps` support for DPT image processor

* Added tests for dpt image processor

* Moved preprocessing into separate functions

* Added # Copied from statements

* Fixed # Copied from statements
elvircrn pushed a commit to elvircrn/transformers that referenced this pull request Feb 13, 2025
…ingface#34345)

* Added `segmentation_maps` support for DPT image processor

* Added tests for dpt image processor

* Moved preprocessing into separate functions

* Added # Copied from statements

* Fixed # Copied from statements

* Added `segmentation_maps` support for DPT image processor

* Added tests for dpt image processor

* Moved preprocessing into separate functions

* Added # Copied from statements

* Fixed # Copied from statements
sbucaille pushed a commit to sbucaille/transformers that referenced this pull request Feb 16, 2025
…ingface#34345)

* Added `segmentation_maps` support for DPT image processor

* Added tests for dpt image processor

* Moved preprocessing into separate functions

* Added # Copied from statements

* Fixed # Copied from statements

* Added `segmentation_maps` support for DPT image processor

* Added tests for dpt image processor

* Moved preprocessing into separate functions

* Added # Copied from statements

* Fixed # Copied from statements
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants