-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: when taking struct fields they should be merged into the output in the correct order #3277
Conversation
python/python/tests/test_filter.py
Outdated
def test_struct_field_order(tmp_path): | ||
data = pa.table({"struct": [{"x": i, "y": i} for i in range(10)]}) | ||
dataset = lance.write_dataset(data, tmp_path) | ||
result = dataset.to_table(filter="struct.y > 5") | ||
|
||
expected = pa.table({"struct": [{"x": i, "y": i} for i in range(6, 10)]}) | ||
assert result == expected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, I'm not 100% clear what this is testing. Just that we can filter on struct columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, well, more specifically that we can filter on struct columns and the resulting data has the correct order. I can add a comment. This was the original test failure that I was working downards to fix. Previously it was returning pa.table({"struct": [{"y": i, "x": i} for i in range(6, 10)]})
which failed the assert.
.dataset | ||
.empty_projection() | ||
.union_schema(&self.projection_plan.physical_schema) | ||
.subtract_predicate(|field| !self.is_early_field(field)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this subtract_predicate
operator!
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3277 +/- ##
==========================================
+ Coverage 78.80% 79.07% +0.26%
==========================================
Files 246 246
Lines 86637 87471 +834
Branches 86637 87471 +834
==========================================
+ Hits 68278 69167 +889
+ Misses 15529 15438 -91
- Partials 2830 2866 +36
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
… rework the concept of projection in general
8b3706d
to
ce4a4f1
Compare
ce4a4f1
to
fb2df4d
Compare
In various situations we need to fetch some fields from a struct and then later add more fields for the struct. For example, maybe we have a
struct<big_string: string, filter: i32>
. We might query with a filter onfilter
and then use late materialization to takebig_string
. When we do this we were previously creatingstruct<filter: i32, big_string: string>
which would cause issues since that isn't the correct data type.In creating this fix I added a new
Projection
concept and I would like to slowly replace a lot of the places where we use schemas as projections to useProjection
instead. Not necessarily for performance but more for convenience.