Prioritize using imagesize library to get image size for ImageFromFile #1259

imyhxy · 2024-02-04T08:04:30Z

Summary

Accelerate loading of image file-based datasets.

I found that printing out the YOLO dataset information for the first time was slow. After some digging I found that datamaro was reading the entire dataset through to get the size of each image.

ds = Dataset.import_from("/yolo-ultralytics", "yolo")
print(ds)  # <-- wait a long time

    # from class Image
    @property
    def size(self) -> Optional[Tuple[int, int]]:
        """Returns (H, W)"""

        if self._size is None:
            try:
                data = self.data  # <-- load the whole media into memory
            except _image_loading_errors:
                return None
            if data is not None:
                self._size = tuple(map(int, data.shape[:2]))
        return self._size

Interactive encoding with datasets on HDD is slow. So I added an override size() property in the ImageFromFile class which first tries to get the image size using PIL. The PIL library is about 8 times faster than OpenCV in getting the image size.

All dataset classes that use the size property of ImageFromFile can benefit from this modification.

How to test

Checklist

I have added unit tests to cover my changes.
I have added integration tests to cover my changes.
I have added the description of my changes into CHANGELOG.
I have updated the documentation accordingly

License

I submit my code changes under the same MIT License that covers the project.
Feel free to contact the maintainers if that's a concern.
I have updated the license header for each file (see an example below).

# Copyright (C) 2023 Intel Corporation
#
# SPDX-License-Identifier: MIT

src/datumaro/components/media.py

Co-authored-by: Vinnam Kim <vinnam.kim@gmail.com>

imyhxy · 2024-02-05T09:58:55Z

@vinnamkim
It would be great if adding new dependencies was acceptable. I actually considered several approaches here and settled on PIL for compatibility. But yes, imagesize is a little faster.

imagesize vs cv2: 0.3s vs 8.1s on my benchmark dataset.

requirements.txt

codecov · 2024-02-05T14:05:10Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (784e039) 80.54% compared to head (38caf4d) 80.51%.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1259      +/-   ##
===========================================
- Coverage    80.54%   80.51%   -0.04%     
===========================================
  Files          271      270       -1     
  Lines        30438    30426      -12     
  Branches      5930     5931       +1     
===========================================
- Hits         24517    24498      -19     
- Misses        4532     4535       +3     
- Partials      1389     1393       +4

Flag	Coverage Δ
ubuntu-20.04_Python-3.8	`?`
windows-2022_Python-3.8	`80.51% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

vinnamkim

LGTM

Use PIL to get size

f26ddc8

imyhxy requested review from a team as code owners February 4, 2024 08:04

imyhxy requested review from wonjuleee and removed request for a team February 4, 2024 08:04

wonjuleee requested a review from vinnamkim February 5, 2024 00:36

vinnamkim reviewed Feb 5, 2024

View reviewed changes

src/datumaro/components/media.py Outdated Show resolved Hide resolved

imyhxy and others added 2 commits February 5, 2024 17:41

Update src/datumaro/components/media.py

bbb941a

Co-authored-by: Vinnam Kim <vinnam.kim@gmail.com>

Add imagesize library

b5a3115

vinnamkim reviewed Feb 5, 2024

View reviewed changes

requirements.txt Outdated Show resolved Hide resolved

imyhxy added 2 commits February 5, 2024 19:48

Update requirements-core.txt

6758635

Fix order

38caf4d

vinnamkim approved these changes Feb 6, 2024

View reviewed changes

vinnamkim merged commit 76fc941 into openvinotoolkit:develop Feb 6, 2024
5 checks passed

vinnamkim changed the title ~~Prioritize using PIL to get image size~~ Prioritize using imagesize library to get image size for ImageFromFile Feb 6, 2024

imyhxy deleted the fast-size branch March 28, 2024 06:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prioritize using imagesize library to get image size for ImageFromFile #1259

Prioritize using imagesize library to get image size for ImageFromFile #1259

imyhxy commented Feb 4, 2024 •

edited

Loading

imyhxy commented Feb 5, 2024

codecov bot commented Feb 5, 2024

vinnamkim left a comment

Prioritize using imagesize library to get image size for ImageFromFile #1259

Prioritize using imagesize library to get image size for ImageFromFile #1259

Conversation

imyhxy commented Feb 4, 2024 • edited Loading

Summary

How to test

Checklist

License

imyhxy commented Feb 5, 2024

codecov bot commented Feb 5, 2024

Codecov Report

vinnamkim left a comment

Choose a reason for hiding this comment

imyhxy commented Feb 4, 2024 •

edited

Loading