Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test tests #5426

Closed
wants to merge 11 commits into from
Closed

test tests #5426

wants to merge 11 commits into from

Conversation

kaixi-wang
Copy link
Contributor

@kaixi-wang kaixi-wang commented Jan 23, 2025

What changes are proposed in this pull request?

(Please fill in changes proposed in this fix)

How is this patch tested? If it is not, please explain why.

(Details)

Release Notes

Is this a user-facing change that should be mentioned in the release notes?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release
    notes for FiftyOne users.

(Details in 1-2 sentences. You can just refer to another PR with a description
if this PR is part of a larger change.)

What areas of FiftyOne does this PR affect?

  • App: FiftyOne application changes
  • Build: Build and test infrastructure changes
  • Core: Core fiftyone Python library changes
  • Documentation: FiftyOne documentation changes
  • Other

Summary by CodeRabbit

  • Tests
    • Enhanced unit tests for dataset collection addition by ensuring unique dataset names.
    • Updated dataset instantiation to include unique names using uuid.
    • Improved assertions for clearer and more consistent field value retrieval in tests.
  • Bug Fixes
    • Added error handling in the sample insertion process to ensure all samples are accounted for.
  • Chores
    • Introduced logging for better tracking of potential issues during sample synchronization.

Copy link
Contributor

coderabbitai bot commented Jan 23, 2025

Walkthrough

The pull request introduces modifications to the test_add_collection_new_ids method in the DatasetTests class and updates the VideoTests class within the unit test files. Key changes include initializing Dataset instances with unique names generated using uuid.uuid4(), adding a save operation immediately after creation, updating the dataset cloning process to also use unique names, and modifying assertions to utilize the values() method for retrieving field contents. Additionally, the add_samples method in the Dataset class now includes error handling for sample insertion, and logging has been added to the _sync_docs_for_sample method in the FrameSingleton class.

Changes

File Change Summary
tests/unittests/dataset_tests.py Updated Dataset initialization to use uuid.uuid4() for unique names, added save operation after creation, modified dataset cloning to include uuid.uuid4() in the name, and added print statements for sample outputs.
tests/unittests/video_tests.py Updated Dataset initialization to use uuid.uuid4() for unique names in multiple places, ensured cloned datasets also have unique names, and updated assertions for clarity.
fiftyone/core/dataset.py Enhanced add_samples method with error handling for sample insertion, ensuring correct _id assignment from inserted IDs.
fiftyone/core/singletons.py Added logging import and a warning message in _sync_docs_for_sample for missing collection names.

Possibly related PRs

  • fix tagger e2e tests #4783: This PR modifies the instantiation of fo.Dataset() to include a unique name generated by uuid.uuid4(), similar to the changes in the main PR where unique names are assigned to datasets in the test_add_collection_new_ids method.
  • Fix COCO category IDs #4884: This PR includes changes to the handling of category IDs in COCO datasets, which may relate to dataset management and testing, although it does not directly modify the same functions as the main PR.
  • Exection store interface cleanup #5086: This PR enhances the SampleCollection class with new methods for managing execution stores, which could relate to dataset operations, but does not directly overlap with the changes made in the main PR.

Suggested reviewers

  • brimoor
  • ritch

Poem

🐰 A Test of Rabbit Might
Assertions dance, a coding delight,
values() method shines so bright.
From first to last, our data's clear,
With new IDs, no need to fear.
A test that hops with pure insight! 🧪

✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/unittests/dataset_tests.py (1)

10-10: Remove unused import

The uuid module is imported but never used in the code.

-import uuid
🧰 Tools
🪛 Ruff (0.8.2)

10-10: uuid imported but unused

Remove unused import: uuid

(F401)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e606113 and 2d71e70.

📒 Files selected for processing (1)
  • tests/unittests/dataset_tests.py (4 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/unittests/dataset_tests.py

10-10: uuid imported but unused

Remove unused import: uuid

(F401)

🔇 Additional comments (3)
tests/unittests/dataset_tests.py (3)

2573-2575: Ensure unique dataset names using datetime

Good practice to use datetime for generating unique dataset names and explicitly saving the dataset after creation.


2578-2578: Maintain consistency in dataset naming

Good practice to use the same datetime-based naming pattern for the cloned dataset.


2604-2605: Improve test assertions using values() method

Good refactor to use the values() method for retrieving field contents consistently.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
fiftyone/core/singletons.py (1)

339-342: Enhance the warning message for better debugging.

The warning message is incomplete and could provide more context. Consider using f-strings for better readability.

-            logging.warning(
-                "Cannot sync frames to corresponding sample collection `%s` ",
-                collection_name,
-            )
+            logging.warning(
+                f"Cannot sync frames to corresponding sample collection `{collection_name}` "
+                f"as it was not found in {cls.__name__}._instances"
+            )
fiftyone/core/dataset.py (2)

3377-3383: Improve error handling for MongoDB insert operation.

The error handling for insert_many has been enhanced to properly capture and handle MongoDB bulk write errors.

Consider adding a more descriptive error message that includes the number of samples that failed to insert:

-            msg = bwe.details["writeErrors"][0]["errmsg"]
-            raise ValueError(msg) from bwe
+            msg = bwe.details["writeErrors"][0]["errmsg"]
+            raise ValueError(
+                f"Failed to insert samples. First error: {msg}. "
+                f"Number of errors: {len(bwe.details['writeErrors'])}"
+            ) from bwe

3388-3390: Add validation to ensure all samples were inserted.

The code now validates that all samples were successfully inserted by comparing the number of inserted IDs to the number of samples.

Consider providing more context in the error message:

-            raise BulkWriteError("Failed to insert all samples")
+            raise BulkWriteError(
+                f"Failed to insert all samples: expected {len(samples)} "
+                f"but inserted {len(new_ids)}"
+            )
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 89dbb2c and ee3f15f.

📒 Files selected for processing (3)
  • fiftyone/core/dataset.py (1 hunks)
  • fiftyone/core/singletons.py (2 hunks)
  • tests/unittests/dataset_tests.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/unittests/dataset_tests.py
🔇 Additional comments (2)
fiftyone/core/singletons.py (1)

8-8: LGTM!

The logging import is appropriately placed with other standard library imports.

fiftyone/core/dataset.py (1)

3391-3396: LGTM: Sample document creation and video frame handling.

The code correctly:

  • Assigns new IDs to the sample documents
  • Creates backing documents for samples
  • Handles video frame saving for video samples

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
tests/unittests/dataset_tests.py (3)

2573-2577: Consider removing or converting print statements to a logger.
Printing datasets and samples is helpful for debugging, but using test assertions or logging might be more consistent with typical testing practices.


2586-2588: Prefer structured assertions over printing.
Relying on print statements alone to verify correctness might hide issues; assertions provide clearer test outcomes.


2608-2610: Improve test clarity by using assertions.
Instead of manually inspecting prints, consider verifying that dataset2 contains the expected samples, with or without fields, to ensure correctness.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ee3f15f and 793ea24.

📒 Files selected for processing (1)
  • tests/unittests/dataset_tests.py (3 hunks)
🔇 Additional comments (4)
tests/unittests/dataset_tests.py (4)

2580-2582: Unique dataset naming approach is acceptable.
Generating random names via uuid.uuid4() can minimize collisions and works well for ephemeral tests.


2602-2604: Fine usage of ephemeral dataset clones.
Cloning with unique UUID names avoids conflicts when running tests in parallel.


2605-2606: Looks good.
Capturing created_at and last_modified_at for subsequent comparisons is a standard approach in these tests.


2607-2607: Verify consistency of new IDs.
When adding a collection with new_ids=True, ensure any references are handled properly so downstream steps remain consistent.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/unittests/dataset_tests.py (1)

2576-2577: Consider removing debug print statements.

While print statements are helpful during development, they should be removed or replaced with proper logging for production code.

-        print("dataset1")
-        print([s for s in dataset1.iter_samples()])
-        print("exact clone") 
-        print([s for s in dataset.iter_samples()])
-        print("merged dataset1")
-        print([s for s in dataset.iter_samples()])
-        print("dataset2 exact clone")
-        print([s for s in dataset1.iter_samples()])
-        print("merge view (ds with no foo)")
-        print([s for s in dataset2.iter_samples()])

Also applies to: 2581-2582, 2586-2587, 2603-2604, 2608-2609

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 793ea24 and 6853b32.

📒 Files selected for processing (1)
  • tests/unittests/dataset_tests.py (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
  • GitHub Check: test / test-app
  • GitHub Check: lint / eslint
  • GitHub Check: e2e / test-e2e
  • GitHub Check: build / build
  • GitHub Check: build / changes
  • GitHub Check: build
🔇 Additional comments (4)
tests/unittests/dataset_tests.py (4)

10-10: Import uuid module for generating unique identifiers.

The addition of the uuid module import is appropriate for generating unique dataset names to prevent collisions in tests.


2573-2575: Use uuid for unique dataset names and ensure persistence.

Good practice to:

  1. Use uuid.uuid4() for guaranteed unique dataset names
  2. Call save() immediately after creation to persist the dataset

2580-2580: Use uuid for clone names to prevent collisions.

Good practice to use uuid.uuid4() when cloning datasets to ensure unique names.

Also applies to: 2602-2602


2605-2617: Test logic for dataset cloning and merging.

The test logic appropriately verifies:

  1. Dataset creation timestamps
  2. Last modified timestamps
  3. Field values and counts
  4. Dataset uniqueness

Base automatically changed from release/v1.3.0 to main January 24, 2025 20:00
@kaixi-wang kaixi-wang closed this Feb 20, 2025
@kaixi-wang kaixi-wang deleted the kacey/test-add-collection-test branch February 20, 2025 05:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant