Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: limit/offset pushdown was not working correctly in the presence of deletions #2895

Merged
merged 5 commits into from
Sep 17, 2024

Conversation

westonpace
Copy link
Contributor

We need to push back the scan start and potentially grab more rows than requested to make sure we fulfill the request. For v2 files we could also convert the row range into multiple ranges instead of grabbing more rows than required. This might be something to investigate in the future.

@github-actions github-actions bot added bug Something isn't working python labels Sep 16, 2024
Copy link
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work right if the range straddles two fragments? Do we care that it does?

@@ -83,6 +83,17 @@ impl DeletionVector {
}
}

/// Create an iterator that iterates over the values in the deletion vector in sorted order.
pub fn to_sorted_iter<'a>(&'a self) -> Box<dyn Iterator<Item = u32> + Send + 'a> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice, I wanted this myself.

@westonpace
Copy link
Contributor Author

Does this work right if the range straddles two fragments? Do we care that it does?
@wjones127

The logic for that is in the scan node. We take the limit/offset for the dataset and chop it up into limit/offsets for each of the fragments. That logic was actually ok. We were accounting for deleted rows so if the the range was something like 1000...3000 and each fragment had 1000 rows and 100 deleted rows we would translate it into:

fragment 0: skip (skip 900 rows)
fragment 1: 100..900 (skip 100, take 800)
fragment 2: 0..900 (skip 0, take 900)
fragment 3: 0..300 (skip 0, take 300)

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 97.75281% with 2 lines in your changes missing coverage. Please review.

Project coverage is 77.90%. Comparing base (c0e966c) to head (7a7d981).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
rust/lance-core/src/utils/deletion.rs 66.66% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2895   +/-   ##
=======================================
  Coverage   77.90%   77.90%           
=======================================
  Files         231      231           
  Lines       70528    70613   +85     
  Branches    70528    70613   +85     
=======================================
+ Hits        54946    55014   +68     
- Misses      12465    12472    +7     
- Partials     3117     3127   +10     
Flag Coverage Δ
unittests 77.90% <97.75%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@westonpace westonpace merged commit 78d90a9 into lancedb:main Sep 17, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants