Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: move-stable row ids in compaction #2544

Merged
merged 7 commits into from
Aug 7, 2024

Conversation

wjones127
Copy link
Contributor

@wjones127 wjones127 commented Jun 28, 2024

Part of #2307.

Also addresses #2397

@wjones127 wjones127 added enhancement New feature or request rust Rust related tasks experimental Features that are experimental labels Jun 28, 2024
@wjones127 wjones127 mentioned this pull request Jun 28, 2024
13 tasks
@wjones127 wjones127 force-pushed the feat/stable-row-id-compact branch from 51402ce to 77214c7 Compare July 31, 2024 16:36
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions are moved from the parent module.

@wjones127 wjones127 force-pushed the feat/stable-row-id-compact branch from 50424ea to f652924 Compare July 31, 2024 19:24
@codecov-commenter
Copy link

codecov-commenter commented Jul 31, 2024

Codecov Report

Attention: Patch coverage is 83.44227% with 76 lines in your changes missing coverage. Please review.

Project coverage is 79.93%. Comparing base (72ff096) to head (2dfe148).

Files Patch % Lines
rust/lance/src/dataset/optimize.rs 80.00% 26 Missing and 26 partials ⚠️
rust/lance-table/src/rowids.rs 42.10% 8 Missing and 3 partials ⚠️
rust/lance/src/dataset/optimize/remapping.rs 94.18% 10 Missing ⚠️
rust/lance/src/dataset/transaction.rs 62.50% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2544      +/-   ##
==========================================
+ Coverage   79.81%   79.93%   +0.12%     
==========================================
  Files         224      225       +1     
  Lines       65871    66377     +506     
  Branches    65871    66377     +506     
==========================================
+ Hits        52572    53060     +488     
+ Misses      10231    10213      -18     
- Partials     3068     3104      +36     
Flag Coverage Δ
unittests 79.93% <83.44%> (+0.12%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wjones127 wjones127 marked this pull request as ready for review July 31, 2024 21:50
@@ -152,6 +152,27 @@ impl RowIdSequence {
self.0.extend_from_slice(remaining_segments);
}

/// Mask
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Mask
/// Delete row ids by position

Or some other comment explaining the difference between this and delete (they look very similar)

Comment on lines +719 to +720
let deletions = read_deletion_file(&dataset.base, frag, dataset.object_store()).await?;
if let Some(deletions) = deletions {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we guaranteed the deletion was materialized at this point? Is there any way two fragments can be compacted without materializing their deletions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that is guaranteed. We compacting by scanning and then writing out. The scan removes deleted rows.

.iter_mut()
.flat_map(|group| group.new_fragments.iter_mut())
.collect::<Vec<_>>();
reserve_fragment_ids(dataset, new_fragments.into_iter()).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need to call reserve_fragment_ids in this case? For some reason I thought commit would just assign ids if they were left as -1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. We don't need to create another transaction for that.

Copy link
Contributor Author

@wjones127 wjones127 Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, turns out this was in fact load bearing. The assigned fragment ids from this step are used to compute the new fragment bitmap for each of the indices. I'll add a comment to clarify this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for the try

@wjones127 wjones127 force-pushed the feat/stable-row-id-compact branch from 2dfe148 to 2637df7 Compare August 6, 2024 22:35
@wjones127 wjones127 merged commit 27fbd4e into lancedb:main Aug 7, 2024
22 checks passed
@wjones127 wjones127 deleted the feat/stable-row-id-compact branch August 7, 2024 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request experimental Features that are experimental rust Rust related tasks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants