-
Notifications
You must be signed in to change notification settings - Fork 605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve search results for multikey indexes in group datasets #5208
Conversation
WalkthroughThe pull request modifies the Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (1)
🔇 Additional comments (1)tests/unittests/lightning_tests.py (1)
The test expectations have been updated to reflect that distinct queries now return all unique values across the dataset, regardless of the slice being queried. This change aligns with the PR objective of improving search results for multikey indexes by preventing distinct scans. Let's verify this behavior change is consistent with other test cases: Also applies to: 1167-1167 ✅ Verification successfulLet me analyze the search results to verify the behavior change. I need to get more context about the test class and its setup. Based on the test context, I can now provide a final response about the review comment: Test expectations correctly reflect distinct query behavior The test expectations are correct and consistent with the implementation. The test creates two samples with classifications labeled "one" and "two", and even when querying only the "one" slice, the distinct values query returns both labels ["one", "two"]. This is the expected behavior as distinct queries should return all unique values across the dataset regardless of the slice being queried. Key evidence:
🏁 Scripts executedThe following scripts were executed for the analysis: Script: #!/bin/bash
# Description: Check for other test cases that might need similar updates for distinct queries
# Search for similar test patterns with distinct queries
rg -A 5 'values": \[.*\]' tests/unittests/
Length of output: 20663 Script: #!/bin/bash
# Get the context of the TestGroupDatasetLightningQueries class and its test methods
rg -B 10 -A 10 "class TestGroupDatasetLightningQueries" tests/unittests/lightning_tests.py
Length of output: 984 Script: #!/bin/bash
# Get more context about the test method and the sample data
rg -B 20 -A 30 "test_group_dataset" tests/unittests/lightning_tests.py
Length of output: 1949 Script: #!/bin/bash
# Get the query and expected results context
rg -A 50 'query = """' tests/unittests/lightning_tests.py
Length of output: 11631 Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
fiftyone/server/lightning.py (1)
352-353
: LGTM! Consider monitoring query performanceThe removal of filtering from the distinct operation should improve performance by preventing distinct scans. Consider adding performance monitoring to validate the improvement.
Consider adding metrics collection for query execution time to quantify the performance improvement. This could help validate the effectiveness of this change and identify any potential bottlenecks.
🧰 Tools
🪛 Ruff (0.8.0)
353-353: Do not use bare
except
(E722)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (1)
fiftyone/server/lightning.py
(2 hunks)
🔇 Additional comments (1)
fiftyone/server/lightning.py (1)
331-331
: Verify the impact on group datasets
The removal of the filter parameter from _do_distinct_query
aligns with the PR objective of preventing distinct scans. However, we should verify that this change doesn't affect the correctness of results for group datasets.
✅ Verification successful
Removal of filter parameter is safe and intentional
The code analysis reveals that the removal of the filter parameter from _do_distinct_query
is correct and safe because:
_do_distinct_query
is only called for fields that have list types (whenquery.has_list
is true)- For non-list fields, the code uses
_do_distinct_pipeline
which properly handles filtering - The change aligns with the PR's goal of preventing distinct scans on multikey indexes
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Search for usages of _do_distinct_query to understand the impact
# Test: Look for calls to _do_distinct_query and verify they don't rely on filtering
rg -A 5 "_do_distinct_query" --type python
Length of output: 73
Script:
#!/bin/bash
# Search for usages of _do_distinct_query without file type flag
rg -A 5 "_do_distinct_query"
Length of output: 711
Script:
#!/bin/bash
# Let's look at the implementation of _do_distinct_query and surrounding context
ast-grep --pattern $'async def _do_distinct_query($$$) {
$$$
}'
Length of output: 66
Script:
#!/bin/bash
# Let's examine the full context of the file to understand the changes
cat fiftyone/server/lightning.py
Length of output: 14092
What changes are proposed in this pull request?
Including a filter for a distinct call prevents a distinct scan, so omit the group slice
What areas of FiftyOne does this PR affect?
fiftyone
Python library changesSummary by CodeRabbit
New Features
Bug Fixes