-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: limit/offset pushdown was not working correctly in the presence of deletions #2895
fix: limit/offset pushdown was not working correctly in the presence of deletions #2895
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work right if the range straddles two fragments? Do we care that it does?
@@ -83,6 +83,17 @@ impl DeletionVector { | |||
} | |||
} | |||
|
|||
/// Create an iterator that iterates over the values in the deletion vector in sorted order. | |||
pub fn to_sorted_iter<'a>(&'a self) -> Box<dyn Iterator<Item = u32> + Send + 'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh nice, I wanted this myself.
…eted rows when doing a take_range operation
The logic for that is in the scan node. We take the limit/offset for the dataset and chop it up into limit/offsets for each of the fragments. That logic was actually ok. We were accounting for deleted rows so if the the range was something like
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2895 +/- ##
=======================================
Coverage 77.90% 77.90%
=======================================
Files 231 231
Lines 70528 70613 +85
Branches 70528 70613 +85
=======================================
+ Hits 54946 55014 +68
- Misses 12465 12472 +7
- Partials 3117 3127 +10
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
We need to push back the scan start and potentially grab more rows than requested to make sure we fulfill the request. For v2 files we could also convert the row range into multiple ranges instead of grabbing more rows than required. This might be something to investigate in the future.