Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Filtering class labels on COCO zoo dataset does NOT return samples matching the provided map #4795

Closed
1 of 3 tasks
mythrandire opened this issue Sep 12, 2024 · 3 comments · Fixed by #4884
Closed
1 of 3 tasks
Labels
bug Bug fixes

Comments

@mythrandire
Copy link
Member

mythrandire commented Sep 12, 2024

Describe the problem

When loading a COCO dataset via fiftyone.zoo.load_zoo_dataset, providing a string or list of strings to only load samples that have at least one instance of the classes specified by the string (or list of strings) does not load samples that match that requested description.

Expected Behavior

foz.load_dataset(
    ...
    classes["cat"]
    ...
    )

should return samples that all contain at least one instance of the class label "cat".

Observed Behavior

returns samples where all samples do contain at least one instance of an arbitrary common class label, which just doesn't happen to be "cat". This behavior can be replicated regardless of the class label or list of class labels requested. For instance, specifying "chair" will return N samples of which maybe one or two contain the label "chair", but all N happen to have the common label "broccoli" (Pictured below: selected images are the only samples that contain "chair"

where_chair

Code to reproduce issue

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("coco-2017", label_types="segmentations", classes=["chair"], max_samples=30, dataset_name="onetest")

session = fo.launch_app(dataset)

System information

  • OS Platform and Distribution: macOS 14.6.1 (23G93), kernel Darwin 23.6.0
  • Python version (python --version): Python 3.11.9
  • FiftyOne version (fiftyone --version): Originally tested on FOT tag v2.1.0.dev28 (internal), replicated on open source FiftyOne tag 0.25.1
  • FiftyOne installed from (pip or source): source developer install i.e. zsh install.bash -d

Other info/logs

Source code analysis

As far as I can tell, the mapping function used to filter requested class labels does not exhibit any obvious bug.

def _to_classes(classes_map):
    return [classes_map[i] for i in sorted(classes_map.keys())]

which is what seems to be the underlying map being applied when load_zoo_dataset is used to import via the COCODatasetImporter class.

Additional notes

Specifying the only_matching parameter as True does nothing to rectify this.

Willingness to contribute

The FiftyOne Community encourages bug fix contributions. Would you or another
member of your organization be willing to contribute a fix for this bug to the
FiftyOne codebase?

  • Yes. I can contribute a fix for this bug independently
  • Yes. I would be willing to contribute a fix for this bug with guidance
    from the FiftyOne community
  • No. I cannot contribute a bug fix at this time
@mythrandire mythrandire added the bug Bug fixes label Sep 12, 2024
@mythrandire
Copy link
Member Author

mythrandire commented Sep 12, 2024

Further context from @benjaminpkane: this issue links #4615 and #4570.

Edit: I'm going to use this thread to add context I believe is relevant (also as an exercise in trying to figure out what the root cause is, by reviewing and examining the following:

  • COCO dataset to FiftyOne dataset conversion (per-sample .json label files --> unified labels.json i.e. the direction indicated by @miller-kevin944)
  • Dig through fiftyone.utils.coco

@mythrandire
Copy link
Member Author

mythrandire commented Sep 12, 2024

Standard COCO label format

(Added for context)

{
    "info": info,
    "licenses": [license], 
    "images": [image],  // list of all images in the dataset
    "annotations": [annotation],  // list of all annotations in the dataset
    "categories": [category]  // list of all categories
}

where:

info = {
    "year": int, 
    "version": str, 
    "description": str, 
    "contributor": str, 
    "url": str, 
    "date_created": datetime,
}

license = {
    "id": int, 
    "name": str, 
    "url": str,
}

image = {
    "id": int, 
    "width": int, 
    "height": int, 
    "file_name": str, 
    "license": int,  // the id of the license
    "date_captured": datetime,
}

category = {
    "id": int, 
    "name": str, 
    "supercategory": str,
}

annotation = {
    "id": int, 
    "image_id": int,  // the id of the image that the annotation belongs to
    "category_id": int,  // the id of the category that the annotation belongs to
    "segmentation": RLE or [polygon], 
    "area": float, 
    "bbox": [x,y,width,height], 
    "iscrowd": int,  // 0 or 1,
}

@brimoor
Copy link
Contributor

brimoor commented Oct 3, 2024

Fixed by #4884 and will be available in the next fiftyone>1.0 release.

@brimoor brimoor closed this as completed Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug fixes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants