Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: enhance repdef utilities to handle empty / null lists #3200

Merged
merged 4 commits into from
Dec 5, 2024

Conversation

westonpace
Copy link
Contributor

Empty & null lists are interesting. If you have them then your final repetition & definition buffers will have more items than you have in your flattened array. This fact required a considerably reworking in how we build and unravel rep/def buffers.

When building we record the position of the specials and then, when we serialize into rep/def buffers, we insert these special values. When unraveling we need to deal with the fact that certain rep/def values are "invisible" to the current context in which we are unraveling.

In addition, we now need to start keeping track of the structure of each layer of repetition in the page metadata. This helps us understand the meaning behind different definition levels later when we are unraveling.

This PR adds the changes to the rep/def utilities. We still aren't actually using repetition levels at all yet. That will come in future PRs.

@github-actions github-actions bot added the enhancement New feature or request label Dec 4, 2024
@westonpace westonpace force-pushed the feat/2.1-repdef-specials branch from 69a3eb5 to b7232bb Compare December 4, 2024 21:30
@codecov-commenter
Copy link

codecov-commenter commented Dec 4, 2024

Codecov Report

Attention: Patch coverage is 91.07692% with 87 lines in your changes missing coverage. Please review.

Project coverage is 78.64%. Comparing base (6e84834) to head (511b255).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
rust/lance-encoding/src/repdef.rs 91.49% 64 Missing and 9 partials ⚠️
rust/lance-encoding/src/format.rs 70.96% 9 Missing ⚠️
rust/lance-encoding/src/testing.rs 28.57% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3200      +/-   ##
==========================================
+ Coverage   78.62%   78.64%   +0.02%     
==========================================
  Files         243      244       +1     
  Lines       82889    83957    +1068     
  Branches    82889    83957    +1068     
==========================================
+ Hits        65170    66029     +859     
- Misses      14933    15137     +204     
- Partials     2786     2791       +5     
Flag Coverage Δ
unittests 78.64% <91.07%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@westonpace westonpace force-pushed the feat/2.1-repdef-specials branch from af3c337 to c8cad40 Compare December 4, 2024 22:35
…hich introduce "specials" into the rep/def buffers. These are values in the rep/def buffer which do not correspond to flattened items.
@westonpace westonpace force-pushed the feat/2.1-repdef-specials branch from c8cad40 to 971e665 Compare December 5, 2024 06:05
@github-actions github-actions bot added the python label Dec 5, 2024
@westonpace westonpace merged commit f21397d into lancedb:main Dec 5, 2024
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants