Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prioritize using imagesize library to get image size for ImageFromFile #1259

Merged
merged 5 commits into from
Feb 6, 2024

Conversation

imyhxy
Copy link
Contributor

@imyhxy imyhxy commented Feb 4, 2024

Summary

Accelerate loading of image file-based datasets.

I found that printing out the YOLO dataset information for the first time was slow. After some digging I found that datamaro was reading the entire dataset through to get the size of each image.

ds = Dataset.import_from("/yolo-ultralytics", "yolo")
print(ds)  # <-- wait a long time
    # from class Image
    @property
    def size(self) -> Optional[Tuple[int, int]]:
        """Returns (H, W)"""

        if self._size is None:
            try:
                data = self.data  # <-- load the whole media into memory
            except _image_loading_errors:
                return None
            if data is not None:
                self._size = tuple(map(int, data.shape[:2]))
        return self._size

Interactive encoding with datasets on HDD is slow. So I added an override size() property in the ImageFromFile class which first tries to get the image size using PIL. The PIL library is about 8 times faster than OpenCV in getting the image size.

All dataset classes that use the size property of ImageFromFile can benefit from this modification.

How to test

Checklist

  • I have added unit tests to cover my changes.​
  • I have added integration tests to cover my changes.​
  • I have added the description of my changes into CHANGELOG.​
  • I have updated the documentation accordingly

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.
  • I have updated the license header for each file (see an example below).
# Copyright (C) 2023 Intel Corporation
#
# SPDX-License-Identifier: MIT

@imyhxy imyhxy requested review from a team as code owners February 4, 2024 08:04
@imyhxy imyhxy requested review from wonjuleee and removed request for a team February 4, 2024 08:04
@wonjuleee wonjuleee requested a review from vinnamkim February 5, 2024 00:36
imyhxy and others added 2 commits February 5, 2024 17:41
Co-authored-by: Vinnam Kim <vinnam.kim@gmail.com>
@imyhxy
Copy link
Contributor Author

imyhxy commented Feb 5, 2024

@vinnamkim
It would be great if adding new dependencies was acceptable. I actually considered several approaches here and settled on PIL for compatibility. But yes, imagesize is a little faster.

imagesize vs cv2: 0.3s vs 8.1s on my benchmark dataset.

requirements.txt Outdated Show resolved Hide resolved
Copy link

codecov bot commented Feb 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (784e039) 80.54% compared to head (38caf4d) 80.51%.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1259      +/-   ##
===========================================
- Coverage    80.54%   80.51%   -0.04%     
===========================================
  Files          271      270       -1     
  Lines        30438    30426      -12     
  Branches      5930     5931       +1     
===========================================
- Hits         24517    24498      -19     
- Misses        4532     4535       +3     
- Partials      1389     1393       +4     
Flag Coverage Δ
ubuntu-20.04_Python-3.8 ?
windows-2022_Python-3.8 80.51% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@vinnamkim vinnamkim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vinnamkim vinnamkim merged commit 76fc941 into openvinotoolkit:develop Feb 6, 2024
5 checks passed
@vinnamkim vinnamkim changed the title Prioritize using PIL to get image size Prioritize using imagesize library to get image size for ImageFromFile Feb 6, 2024
@imyhxy imyhxy deleted the fast-size branch March 28, 2024 06:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants