Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: more flexible and sophisticated handling of non-null constraints #2467

Merged
merged 3 commits into from
Oct 25, 2024

Conversation

westonpace
Copy link
Contributor

@westonpace westonpace commented Jun 13, 2024

Previously, we used batch schema to check if incoming data was compatible with the dataset. This has a few problems:

  • You couldn't add a non-nullable batch to a nullable dataset even though this should be perfectly valid
  • You couldn't add a nullable batch to a non-nullable dataset even when the batch didn't contain nulls
  • It's possible (and with pyarrow, easy) to create a batch that has nulls but the fields are marked non-null, this led to confusing errors.

This PR resolves those issues by checking the null count of the actual arrays instead of relying on the schema assigned to the data.

Closes #1936

@github-actions github-actions bot added enhancement New feature or request python labels Jun 13, 2024
@westonpace westonpace force-pushed the feat/improved-nullability-checks branch from 54a5fb2 to 301c2ce Compare October 25, 2024 15:49
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 88.31169% with 9 lines in your changes missing coverage. Please review.

Project coverage is 78.39%. Comparing base (f17d88d) to head (36b13dc).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
rust/lance-file/src/writer.rs 76.19% 1 Missing and 4 partials ⚠️
rust/lance-file/src/v2/writer.rs 76.47% 1 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2467      +/-   ##
==========================================
+ Coverage   78.24%   78.39%   +0.14%     
==========================================
  Files         240      240              
  Lines       77284    78593    +1309     
  Branches    77284    78593    +1309     
==========================================
+ Hits        60470    61611    +1141     
- Misses      13696    13850     +154     
- Partials     3118     3132      +14     
Flag Coverage Δ
unittests 78.39% <88.31%> (+0.14%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I appreciate all the tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary, but would be cool to have a test showing you can use alter_columns to change the nullability, and that inserting nulls would then work without an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try and add one later today but want to merge this now to clear my plate 🤷

@westonpace westonpace merged commit c9cabc1 into lancedb:main Oct 25, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Potential panic when writing invalid batch (nullability violated)
3 participants