Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: expose number of rows covered per delta #2979

Merged
merged 1 commit into from
Oct 15, 2024

Conversation

chebbyChefNEQ
Copy link
Contributor

This PR adds a field to index_stats where we now return the number of rows covered by each delta in num_indexed_rows_per_delta

TODO:

  • tests

@github-actions github-actions bot added the enhancement New feature or request label Oct 4, 2024
Copy link
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little concerned about whether the result is useful if we don't even guarantee the order of the index deltas. Otherwise seems fine. Though I feel like it's not that hard for users to compute this themselves, as the fragment_bitmap and num_rows should both be exposed publicly, in Python and Rust.

@@ -550,6 +555,7 @@ impl DatasetIndexExt for Dataset {
"num_indexed_rows": num_indexed_rows,
"num_unindexed_fragments": num_unindexed_fragments,
"num_unindexed_rows": num_unindexed_rows,
"num_indexed_rows_per_delta": num_indexed_rows_per_delta,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this implicitly in the same order that list_indices returns? Do we even guarantee an order there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we guarantee append order, let me check

@chebbyChefNEQ
Copy link
Contributor Author

I'm a little concerned about whether the result is useful if we don't even guarantee the order of the index deltas. Otherwise seems fine. Though I feel like it's not that hard for users to compute this themselves, as the fragment_bitmap and num_rows should both be exposed publicly, in Python and Rust.

We don't expose the per delta fragment bitmap to python or rust. This is only a stop gap so we have something for calculating the "base" index rows. Eventually this should be something we can track natively in v3

@chebbyChefNEQ
Copy link
Contributor Author

I'm a little concerned about whether the result is useful if we don't even guarantee the order of the index deltas. Otherwise seems fine. Though I feel like it's not that hard for users to compute this themselves, as the fragment_bitmap and num_rows should both be exposed publicly, in Python and Rust.

We don't expose the per delta fragment bitmap to python or rust. This is only a stop gap so we have something for calculating the "base" index rows. Eventually this should be something we can track natively in v3

Actually, looks like it's a pub trait let me see if this is exposed thru lancedb

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 87.17949% with 5 lines in your changes missing coverage. Please review.

Project coverage is 78.85%. Comparing base (add0f34) to head (b3069e0).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
rust/lance/src/index.rs 87.17% 0 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2979      +/-   ##
==========================================
+ Coverage   78.81%   78.85%   +0.03%     
==========================================
  Files         235      236       +1     
  Lines       73141    73581     +440     
  Branches    73141    73581     +440     
==========================================
+ Hits        57648    58019     +371     
- Misses      12513    12554      +41     
- Partials     2980     3008      +28     
Flag Coverage Δ
unittests 78.85% <87.17%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@chebbyChefNEQ
Copy link
Contributor Author

going to merge this for the time being as this saves us from drifting from OSS.

@chebbyChefNEQ chebbyChefNEQ merged commit 81554cd into main Oct 15, 2024
25 checks passed
@chebbyChefNEQ chebbyChefNEQ deleted the rmeng/base-idx-stats branch October 15, 2024 00:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants