-
Notifications
You must be signed in to change notification settings - Fork 605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix COCO category IDs #4884
Fix COCO category IDs #4884
Conversation
WalkthroughThe pull request introduces updates to the FiftyOne library's documentation and codebase, focusing on dataset loading and exporting functionalities. Key changes include modifications to the Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (10)
docs/source/user_guide/export_datasets.rst (3)
Line range hint
4-4
: Address the TODO comment for adding testsThe TODO comment indicates that tests are missing for this function. It's important to add unit tests to ensure the function behaves correctly for various inputs and edge cases.
Would you like assistance in generating unit tests for this function?
Line range hint
12-24
: Refactor discount calculation for fairness and clarityThe current implementation has a few areas for improvement:
The flat $20 fee on discounted bills can lead to situations where customers with a small discount pay more than those without a discount. Consider a percentage-based fee or a sliding scale.
The function doesn't handle edge cases like negative amounts or loyalty years. Add input validation to ensure robust behavior.
The function is quite long. Consider breaking it down into smaller, more focused functions for better readability and maintainability.
There's no handling for very large loyalty years, which could lead to unexpectedly high discounts if not intended.
Would you like assistance in refactoring this function to address these points?
Line range hint
1-24
: Consider overall improvements for consistency and robustnessTo enhance the overall quality of the code:
- Implement consistent error handling and input validation across all functions.
- Add docstrings and type hints to improve code clarity and maintainability.
- Consider removing the
coderabbit_
prefix from function names if it's not serving a specific purpose.- Ensure all functions follow the same style and level of complexity – currently, there's a mix of very simple functions and more complex ones.
These changes will make the code more robust, easier to understand, and maintain in the long run.
docs/source/user_guide/dataset_creation/datasets.rst (7)
Line range hint
1-114
: Consider adding a brief explanation of FiftyOne's purpose in the introduction.The introduction effectively outlines FiftyOne's dataset import capabilities. However, for newcomers, it might be helpful to include a brief explanation of what FiftyOne is and its primary purpose before diving into the specifics of dataset import.
Consider adding a sentence or two at the beginning, such as:
"FiftyOne is a tool for building, analyzing, and improving datasets for computer vision tasks. It provides powerful capabilities for dataset import, visualization, and manipulation."
Line range hint
116-214
: Consider grouping dataset types by category for easier navigation.The table of supported formats is comprehensive and well-structured. To improve usability, especially as the list of supported formats grows, consider grouping the dataset types into categories such as "Image Classification", "Object Detection", "Segmentation", etc. This grouping could help users quickly find the format they're looking for.
Here's an example of how you could structure the grouping:
.. table:: :widths: 40 60 +------------------------+-----------------------------------------------------------+ | Image Classification | +========================+===========================================================+ | ImageClassification... | A labeled dataset consisting of images and their ... | +------------------------+-----------------------------------------------------------+ | TFImageClassificat... | A labeled dataset consisting of images and their ... | +------------------------+-----------------------------------------------------------+ | Object Detection | +========================+===========================================================+ | COCODetectionDataset | A labeled dataset consisting of images and their ... | +------------------------+-----------------------------------------------------------+ | VOCDetectionDataset | A labeled dataset consisting of images and their ... | +------------------------+-----------------------------------------------------------+ ...
Line range hint
216-2153
: Ensure consistency in structure and information across all dataset type sections.The individual dataset type import sections are comprehensive and provide valuable information. To further improve the documentation, consider standardizing the structure of each section to include the following elements consistently:
- Brief description of the dataset type
- File structure expected by the importer
- Python code example
- CLI code example
- Notes on specific features or requirements (if applicable)
- Link to the corresponding DatasetImporter class for advanced usage
This consistent structure will make it easier for users to find the information they need across different dataset types.
Consider creating a template for dataset type sections to ensure all necessary information is included consistently.
Line range hint
2155-2220
: Expand guidance on choosing between custom importers and other methods.The custom formats section effectively introduces the concept of custom DatasetImporter classes. To provide more value to users, consider expanding this section to include:
- A comparison of when to use custom importers vs. the simple Python loop method mentioned at the beginning.
- Guidelines or best practices for deciding when to create a custom importer.
- A brief example of a simple custom importer to illustrate the concept.
This additional information would help users make informed decisions about the best approach for their specific use case.
Consider adding a subsection titled "When to use custom importers" that outlines the scenarios where custom importers are most beneficial, such as:
- When you need to reuse the import logic across multiple projects
- When the import process is complex and requires significant preprocessing
- When you want to leverage FiftyOne's built-in import utilities and error handling
Line range hint
2222-3203
: Consider adding a high-level overview and flowchart for custom importer creation.The "Writing a custom DatasetImporter" section provides detailed templates and explanations for creating custom importers. To make this complex information more accessible, consider adding:
- A high-level overview of the custom importer creation process.
- A flowchart or decision tree to guide users in choosing the appropriate importer type for their dataset.
- A simple, complete example of a custom importer for a common use case.
These additions would help users grasp the overall concept before diving into the specific implementation details.
Add a subsection at the beginning of "Writing a custom DatasetImporter" with a high-level overview, such as:
Custom Importer Overview ------------------------ Creating a custom importer involves the following steps: 1. Choose the appropriate base class (UnlabeledImageDatasetImporter, LabeledImageDatasetImporter, etc.) 2. Implement required methods (__init__, __len__, __next__, etc.) 3. Define properties (has_dataset_info, label_cls, etc.) 4. Implement optional methods (setup, get_dataset_info, close) The choice of base class depends on your dataset type: - UnlabeledImageDatasetImporter: For datasets with images only - LabeledImageDatasetImporter: For datasets with images and labels - UnlabeledVideoDatasetImporter: For datasets with videos only - LabeledVideoDatasetImporter: For datasets with videos and labels - GroupDatasetImporter: For datasets with grouped samples Follow the templates below for your specific dataset type.
Line range hint
3205-3239
: Enhance explanation of dataset-level information importance and usage.The "Importing dataset-level information" section provides valuable information about importing dataset-level metadata. To improve this section:
- Add a brief explanation of why dataset-level information is important and how it can be used in FiftyOne.
- Provide a simple example of how to implement the
get_dataset_info()
method in a custom importer.- Clarify how users can access and utilize the imported dataset-level information in their FiftyOne workflows.
These additions would help users understand the significance of this feature and how to leverage it effectively.
Consider adding an example implementation of
get_dataset_info()
:def get_dataset_info(self): return { "name": "My Custom Dataset", "version": "1.0", "classes": ["cat", "dog", "bird"], "author": "John Doe", "date_created": "2023-04-15", "description": "A dataset of pet images for classification", }Also, add a note on how to access this information:
Note: After importing, you can access the dataset-level information using the `dataset.info` attribute or specific properties like `dataset.classes`.
Line range hint
3241-3526
: Clarify the relationship between custom Dataset types and Importers.The "Writing a custom Dataset type" section provides good information on creating custom dataset types. To improve this section:
- Explain the relationship between custom Dataset types and custom DatasetImporters more explicitly.
- Provide a brief example of how a custom Dataset type and its corresponding Importer work together.
- Clarify when users should create a custom Dataset type versus just a custom Importer.
These additions would help users understand the full picture of custom dataset handling in FiftyOne.
Consider adding a subsection at the beginning of "Writing a custom Dataset type" to explain the relationship:
Relationship between Dataset Types and Importers ------------------------------------------------ In FiftyOne, Dataset Types and Importers work together to provide a complete solution for handling custom datasets: - Dataset Type: Defines the format and structure of the dataset on disk. - Importer: Implements the logic to read the dataset from disk and create FiftyOne samples. When you create a custom Dataset Type, you typically need to create a corresponding custom Importer. The Dataset Type class specifies which Importer to use through the `get_dataset_importer_cls()` method. Create a custom Dataset Type when you want to: 1. Define a new dataset format that can be consistently used across projects. 2. Provide a convenient way to import your custom dataset format using `Dataset.from_dir()`. 3. Enable exporting datasets in your custom format (by also implementing a custom Exporter). If you only need to import a dataset once or in a very specific scenario, creating just a custom Importer might be sufficient.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (4)
- docs/source/user_guide/dataset_creation/datasets.rst (2 hunks)
- docs/source/user_guide/export_datasets.rst (2 hunks)
- fiftyone/utils/coco.py (9 hunks)
- tests/unittests/import_export_tests.py (1 hunks)
🔇 Additional comments (12)
fiftyone/utils/coco.py (10)
567-567
: Include class names in dataset infoSetting
info["classes"] = _to_classes(classes_map)
ensures that the class names are included in the dataset information. This is important for downstream processes that may rely on the class list provided in the dataset metadata.
Line range hint
570-577
: Align function call with updated signatureBy passing
classes_map
to_get_matching_image_ids
, you align the function call with its updated signature. This ensures that image IDs are correctly filtered based on the provided classes.
906-906
: Generate label mapping for dynamic classesWhen using dynamic classes (
self._dynamic_classes
isTrue
), sortingself._classes
before generatinglabels_map_rev
ensures consistent category IDs in the exported annotations. This change helps maintain the correct mapping between class names and IDs.
909-910
: Handle cases where categories are not providedAdding
elif self.categories is None
correctly handles scenarios wherecategories
are not supplied. This prevents overwritinglabels_map_rev
whencategories
are provided, ensuring that the category IDs remain consistent.
931-934
: Include 'supercategory' in category definitionsIncluding the
'supercategory': None
field in the category definitions adheres to the COCO format specifications. This enhances compatibility with tools and libraries that expect thesupercategory
field in the category metadata.
1679-1679
: Pass 'all_classes_map' to '_get_images_with_classes'By passing
all_classes_map
to_get_images_with_classes
, you're ensuring that the correct class ID mappings are used when filtering images based on the specified classes. This helps accurately identify images containing the target classes.
1844-1844
: Update '_get_matching_image_ids' function callIncluding
classes_map
in the_get_matching_image_ids
function call aligns with the updated function signature. This ensures that image IDs are filtered correctly based on the classes mapping.
1860-1860
: Adjust parameters in '_get_images_with_classes' callPassing
classes_map
as the last parameter matches the updated signature of_get_images_with_classes
. This change ensures that class names are correctly mapped to IDs during image filtering.
1937-1939
: Create reverse label mapping from 'classes_map'Generating
labels_map_rev = {c: i for i, c in classes_map.items()}
creates a mapping from class names to IDs. This reverse mapping is essential for validating provided class names and filtering images accordingly.
2028-2028
: Ensure category IDs start from 1Returning
{c: i for i, c in enumerate(classes, 1)}
in_to_labels_map_rev
ensures that category IDs start from 1, which aligns with the COCO format requirements where category IDs are positive integers starting from 1.tests/unittests/import_export_tests.py (2)
1320-1339
: Test for Alphabetized 1-Based Categories is CorrectThe added test correctly verifies that categories are alphabetized and assigned 1-based IDs by default when exporting to COCO format. This ensures that the exported dataset maintains the expected structure.
1340-1378
: Test for Loading Matching Classes Behaves as ExpectedThe test accurately checks the behavior of loading a COCO dataset with specified classes and the
only_matching
flag. It confirms that whenonly_matching=False
, all samples are loaded but only specified classes are filtered in the labels, and whenonly_matching=True
, only the detections matching the specified classes are included. This ensures the dataset loading process functions as intended.
2bedfb3
to
f08b8cc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
fiftyone/utils/coco.py (1)
Line range hint
1928-1939
: Improved class mapping handlingThe changes to
_get_images_with_classes
enhance the function's ability to work with class mappings. The addition ofclasses_map
to the function signature and the creation oflabels_map_rev
improve the handling of class IDs and labels. The use ofclass_ids
instead oftarget_class_ids
enhances naming consistency.Consider renaming
labels_map_rev
toclass_id_map
orlabel_to_id_map
for better clarity on its purpose.docs/source/user_guide/dataset_creation/datasets.rst (1)
Line range hint
4-4
: TODO: Add tests for this functionThere's a TODO comment indicating that tests need to be added for this function. It's important to implement these tests to ensure the function works as expected, especially after the signature change.
Would you like assistance in generating unit tests for this function?
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (4)
- docs/source/user_guide/dataset_creation/datasets.rst (2 hunks)
- docs/source/user_guide/export_datasets.rst (2 hunks)
- fiftyone/utils/coco.py (9 hunks)
- tests/unittests/import_export_tests.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- docs/source/user_guide/export_datasets.rst
🔇 Additional comments (10)
fiftyone/utils/coco.py (5)
567-567
: Improvement in class information handlingThis addition ensures that the
info
dictionary contains a "classes" key with a sorted list of class labels. Using_to_classes(classes_map)
converts theclasses_map
to a sorted list, which improves consistency and makes it easier to work with the class information later in the code.
570-570
: Enhanced image filtering with class mappingThe addition of
classes_map
as an argument to_get_matching_image_ids
improves the function's ability to filter images based on classes. This change allows for more accurate and efficient image selection when working with specific class requirements.
Line range hint
1844-1860
: Consistent update to image filtering functionThe changes to
_get_matching_image_ids
are consistent with the earlier modification. By includingclasses_map
in the function signature and passing it to_get_images_with_classes
, the code ensures that class mapping information is used consistently throughout the image filtering process. This improves the accuracy and reliability of class-based image selection.
2028-2028
: Simplified and improved class ID mappingThe modification to
_to_labels_map_rev
usingenumerate(classes, 1)
is an excellent improvement. This change makes the function more concise and Pythonic while ensuring that class IDs start from 1, which is consistent with the COCO format. This simplification reduces the chance of errors and improves code readability.
Line range hint
1-2391
: Overall improvement in COCO dataset handlingThe changes made to this file significantly enhance the handling of class mappings and category IDs in COCO dataset operations. The modifications improve consistency, efficiency, and accuracy in filtering images based on classes and working with class information. These updates will lead to more reliable and maintainable code when working with COCO format datasets in FiftyOne.
tests/unittests/import_export_tests.py (5)
1320-1377
: LGTM! New tests for COCO dataset handling added.The changes introduce important new test cases for the COCO detection dataset export and import functionality:
- Verifying that categories are alphabetized and use 1-based indexing by default.
- Testing the ability to load only matching classes, both with and without including non-matching samples.
These additions enhance the test coverage for critical COCO dataset handling features.
Line range hint
1379-1412
: Great addition of test foradd_yolo_labels
functionality.This new test function covers important aspects of adding YOLO labels to a dataset:
- Standard label addition, verifying that label counts match between original and new fields.
- Inclusion of missing labels, ensuring all samples have labels when the
include_missing
option is used.The test cases are well-structured and cover both the happy path and an important edge case.
Line range hint
1433-1813
: Excellent addition of comprehensive tests for 3D media export.The new
ThreeDMediaTests
class provides thorough coverage of 3D media export functionality:
- Tests for flat relative and absolute addressed scenes.
- Tests for nested relative structures with both flattening and maintaining directory structure.
- Verification of exported file structure and content integrity.
Key strengths of this test suite:
- Uses temporary directories for clean test environments.
- Covers various realistic scenarios for 3D media organization.
- Includes detailed assertions to verify both structure and content of exported files.
This addition significantly enhances the test coverage for 3D media export features, which is crucial for ensuring reliability in handling complex 3D data structures.
Line range hint
1320-1813
: Summary: Significant enhancement of test coverage for various dataset types and export/import functionalities.These changes introduce valuable additions to the test suite:
- Extended tests for COCO detection dataset handling, including category alphabetization and selective class loading.
- New tests for YOLO and COCO label addition functionalities.
- Comprehensive test suite for 3D media export covering various file structure scenarios.
These additions significantly improve the robustness of the testing framework, especially for complex use cases and edge scenarios. The new tests will help ensure the reliability and correctness of dataset manipulation and export/import operations across different formats and structures.
Suggestions for future improvements:
- Consider addressing the Windows compatibility issue in the COCO label addition test.
- As the test file grows, it might be beneficial to consider splitting it into more focused test files for better organization and maintainability.
Overall, these changes represent a substantial improvement in the quality and coverage of the test suite.
Line range hint
1414-1431
: Approved: New test foradd_coco_labels
, but consider Windows compatibility.This new test function effectively verifies the
add_coco_labels
functionality by:
- Exporting the dataset to COCO format.
- Adding COCO labels back to the original dataset.
- Verifying that label counts match between original and new fields.
The test structure is sound and covers the core functionality well.
The
@skipwindows
decorator suggests potential compatibility issues on Windows platforms. It would be beneficial to investigate and address these issues to ensure cross-platform compatibility.To verify the extent of Windows-specific issues, you can run the following commands:
✅ Verification successful
Approved: New test for
add_coco_labels
, but consider Windows compatibility.This new test function effectively verifies the
add_coco_labels
functionality by:
- Exporting the dataset to COCO format.
- Adding COCO labels back to the original dataset.
- Verifying that label counts match between original and new fields.
The test structure is sound and covers the core functionality well.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for all occurrences of @skipwindows in the codebase rg "@skipwindows" --type python # Check if there are any Windows-specific handling in the codebase rg "if os\.name == (?:'nt'|\"nt\")" --type pythonLength of output: 149
Script:
#!/bin/bash # Search for all occurrences of @skipwindows in the codebase rg "@skipwindows" --type py # Check if there are any Windows-specific handling in the codebase rg "if os\.name == (?:'nt'|\"nt\")" --type pyLength of output: 464
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (5)
fiftyone/utils/coco.py (5)
48-48
: Approved: Improved parameter naming and documentationThe changes in this segment are well-thought-out and improve the function's alignment with COCO terminology:
- Renaming
classes
tocategories
is more accurate and consistent with COCO format.- The updated docstring provides clear information about the accepted formats for the
categories
parameter.- The change in
category_id
from 2 to 1 in the examples is consistent with COCO's 1-based indexing for category IDs.Consider adding a brief explanation in the docstring about why category IDs start from 1 (e.g., "COCO uses 1-based indexing for category IDs") to help users understand this convention.
Also applies to: 71-71, 91-91, 112-112, 132-139
916-920
: Approved: Improved handling of categories in export processThe changes in the
COCODetectionDatasetExporter
class enhance the robustness and flexibility of the export process:
- The
close
method now handles the case whenself.categories
is None, providing a fallback to useself.classes
.- The logic for creating the
categories
list has been updated to ensure correct assignment of category IDs when using a customlabels_map_rev
.These changes improve the exporter's ability to handle various input scenarios.
Consider simplifying the logic for creating the
categories
list using a list comprehension:categories = [ {"id": i, "name": c, "supercategory": None} for c, i in sorted(labels_map_rev.items(), key=lambda t: t[1]) ]This change would make the code more concise and potentially more readable.
Also applies to: 941-944
1938-1938
: Approved: Enhanced class handling in image filteringThe changes in the
_get_images_with_classes
function improve its flexibility and efficiency:
- The function now uses
classes_map
instead ofall_classes
, consistent with earlier changes.- A
labels_map_rev
dictionary is created for efficient lookup of class IDs by label.- The logic for checking unsupported classes has been updated to use
labels_map_rev
.These modifications enhance the function's ability to handle various class mapping scenarios.
Consider using a set comprehension for creating
class_ids
to potentially improve performance:class_ids = {labels_map_rev[c] for c in target_classes}This change would eliminate the need for the separate
set()
call and might be slightly more efficient.Also applies to: 1947-1951
2038-2038
: Approved: Simplified and COCO-compliant label mappingThe
_to_labels_map_rev
function has been improved:
- The function is now a concise one-line dictionary comprehension.
- It uses 1-based indexing for class IDs, which is consistent with COCO's convention.
This change enhances readability and ensures compliance with COCO standards.
Consider adding a brief docstring to explain the function's purpose and the 1-based indexing:
def _to_labels_map_rev(classes): """ Create a reverse mapping of class labels to 1-based category IDs. Args: classes: List of class labels. Returns: Dict mapping class labels to their corresponding 1-based category IDs. """ return {c: i for i, c in enumerate(classes, 1)}
Line range hint
2038-2051
: Approved: Improved category parsing with enhanced flexibilityThe changes in the
_parse_categories
function improve its flexibility and readability:
- The function now handles the case when
classes
is None, returning all categories.- The logic for creating the return dictionary has been simplified and made more concise.
- The function correctly handles both cases: when specific classes are provided and when all classes should be included.
These modifications make the function more robust and easier to use in various scenarios.
Consider using a dictionary comprehension to make the function even more concise:
def _parse_categories(categories, classes=None): classes_map, _ = parse_coco_categories(categories) if classes is None: return classes_map classes = {classes} if isinstance(classes, str) else set(classes) return {c: i for c, i in classes_map.items() if c in classes}This change would eliminate the need for the separate dictionary creation and potentially improve readability.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (3)
- docs/source/integrations/coco.rst (6 hunks)
- fiftyone/utils/coco.py (15 hunks)
- tests/unittests/import_export_tests.py (2 hunks)
🧰 Additional context used
🪛 Ruff
tests/unittests/import_export_tests.py
1822-1822: Ambiguous variable name:
l
(E741)
🔇 Additional comments (10)
docs/source/integrations/coco.rst (6)
199-201
: Excellent addition of theseed
parameter for reproducibility!The inclusion of the
seed
parameter in thetake()
method is a valuable improvement. This change ensures that the same subset of samples can be consistently selected across different runs, which is crucial for reproducibility in data science workflows.
Line range hint
219-230
: Updated JSON structure enhances clarityThe revised JSON structure example now includes the
categories
section with anid
field, which better reflects the actual COCO format. This change provides users with a more accurate representation of the data structure they'll be working with.
240-247
: Improved annotation example with additional detailsThe updated annotation example now includes more precise coordinate values and an
area
field. These additions provide a more comprehensive representation of COCO annotations, which is beneficial for users who need to understand the full structure of the data.
270-272
: Valuable information about imported COCO categoriesThe new lines explaining that COCO categories are also imported provide crucial information for users. This addition helps users understand that the category information is preserved when loading COCO-formatted data, which is important for maintaining the full context of the dataset.
319-323
: Corrected terminology in mock COCO predictionsThe update from
class_id
tocategory_id
in the mock COCO predictions example is a significant improvement. This change aligns the example with the official COCO format, reducing potential confusion for users familiar with COCO datasets.
325-328
: Updated method for adding COCO predictionsThe change from
classes
tocategories
when adding COCO predictions to the dataset is an important update. This modification ensures consistency with the COCO format and the earlier changes in the document. It's a crucial change that maintains the integrity of the COCO data structure within FiftyOne.fiftyone/utils/coco.py (2)
1854-1854
: Approved: Consistent use of classes_map in image ID matchingThe changes in the
_get_matching_image_ids
function maintain consistency with earlier modifications:
- The function signature now includes
classes_map
instead ofall_classes
.- The call to
_get_images_with_classes
has been updated to useclasses_map
.These changes improve the function's flexibility by relying on the
classes_map
for class information.To ensure these changes don't affect existing function calls, let's verify the usage of
_get_matching_image_ids
in the codebase:#!/bin/bash # Search for calls to '_get_matching_image_ids' echo "Searching for '_get_matching_image_ids' calls:" rg --type python '_get_matching_image_ids\(' -C 3Also applies to: 1870-1871
Line range hint
577-586
: Approved: Consistent updates to class initialization and method callThe changes in the
COCODetectionDatasetImporter
class are consistent with the earlier renaming ofclasses
tocategories
:
- The
__init__
method now usescategories
instead ofclasses
.- The
_get_matching_image_ids
method call has been updated to useclasses_map
instead ofall_classes
.These changes improve consistency and reflect the shift towards using a mapping structure for categories.
To ensure these changes don't have unintended consequences, let's verify the usage of
categories
andclasses_map
in other parts of the codebase:✅ Verification successful
Action Required: Re-run updated search scripts to verify usage of
categories
andclasses_map
The previous script encountered an error due to an unrecognized file type and a syntax issue. Please run the corrected shell script below to search for usages of
categories
andclasses_map
in Python files:
Verification Complete: No Issues Found with
categories
andclasses_map
UsageThe updates to the
COCODetectionDatasetImporter
class are consistent across the codebase. The usage ofcategories
andclasses_map
aligns with the renaming and restructuring changes, ensuring no unintended side effects.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for usages of 'categories' and 'classes_map' in Python files echo "Searching for 'categories' usage:" rg --type python 'categories' -C 2 echo "\nSearching for 'classes_map' usage:" rg --type python 'classes_map' -C 2Length of output: 285
Script:
#!/bin/bash # Search for usages of 'categories' and 'classes_map' in Python files echo "Searching for 'categories' usage:" rg 'categories' -g '*.py' -C 2 echo "\nSearching for 'classes_map' usage:" rg 'classes_map' -g '*.py' -C 2Length of output: 49566
tests/unittests/import_export_tests.py (2)
1320-1338
: Well-structured test for default COCO category handling.This new test case is a valuable addition to the COCO dataset export functionality. It verifies that when no specific category information is provided, the exported categories are alphabetized and use 1-based indexing, which is the expected default behavior for COCO datasets.
1340-1377
:⚠️ Potential issueComprehensive tests for COCO class filtering.
These new test cases effectively cover class filtering scenarios for COCO dataset import. They verify both inclusive and exclusive filtering behaviors, which is crucial for ensuring correct data loading based on specified classes.
However, there's a minor issue with variable naming:
On line 1822, the variable name
l
is ambiguous. It's best to use more descriptive variable names to improve code readability.Apply this diff to address the issue:
- categories = [{"supercategory": "animal", "id": i, "name": l} for i, l in enumerate(classes, 1)] + categories = [{"supercategory": "animal", "id": i, "name": label} for i, label in enumerate(classes, 1)]This change makes the code more readable without affecting its functionality.
Likely invalid or redundant comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Resolves #4615
Resolves #4795
Resolves #4570
#4354 introduced regression #4795, which this PR fixes. I also resolved #4615 while I was at it.
Tested by
Unit tests are added.
Example usage
Summary by CodeRabbit
Documentation
COCODetectionDataset
format.Tests