Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix: CLI hangs with super big dataset #255

Merged
merged 11 commits into from
May 15, 2024
Merged

Conversation

NickHerrig
Copy link
Collaborator

@NickHerrig NickHerrig commented May 14, 2024

Description

Fixes roboflow-bugtracker#908

Two dictionaries were added to speed up the search for images and annotations rather than iterating through a list to find matches.

Along with that a tqdm loading bar was added to the image loop to give users feedback during the parsing of their annotation/image folder.

List any dependencies that are required for this change.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

How has this change been tested, please provide a testcase or example of how you tested the change?

All existing tests under tests/util/test_folderparser.py are passing after this work.
An inital stub test below was implemented to test speed improvements on the CocoNut Datset.

if you have the coconut dataset, it can be passed to the dev container with the following docker-compose.yml

version: '3'
services:
  devcontainer-roboflow-python:
    build:
      context: ..
      dockerfile: Dockerfile.dev
    image: devcontainer-roboflow-python
    volumes:
      - ..:/roboflow-python
      - {path-to-coconut-dataset}:/coconut
    command: sleep infinity
def test_parse_coconut(self):
        folder = "/coconut/images/COCONut-S/"
        parsed = folderparser.parsefolder(folder)

Before this PR:
slow_upload

After this PR:
optimized_upload

@NickHerrig NickHerrig marked this pull request as ready for review May 15, 2024 03:36
@NickHerrig NickHerrig changed the title Initial commit image/annotation dictoinaries Initial commit image/annotation dictionaries May 15, 2024
@NickHerrig
Copy link
Collaborator Author

@tonylampada ready for review.
Let me know if you have any ideas for improving this.

@tonylampada tonylampada changed the title Initial commit image/annotation dictionaries bugfix: CLI hangs with super big dataset May 15, 2024
@tonylampada
Copy link
Collaborator

@NickHerrig this is looking great!
I'm making a couple more commits on top of it ok?

Copy link
Collaborator

@tonylampada tonylampada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good job!
thanks!!

@tonylampada tonylampada merged commit 9c716fb into main May 15, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants