-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add a new BlobFile API that can be used to read blob data #2983
Conversation
This adds the flag This all works pretty well but it is a bit weird that the same column might have two different data types depending on how it is read. However, I also want to enable this for strings/lists at some point (to read all strings/lists as small or all strings/lists as large) and so even though I find it slightly weird I am thinking it is ok? Welcome review on the idea. |
I suppose another way we can tackle the issue is to create a virtual column @wjones127 for second opinion |
Ok(if self.id >= 0 { | ||
self.clone() | ||
} else { | ||
other.clone() | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are a little bit squirrel-y and the whole concept of intersection ignoring data types is a little odd...I'm leaning towards special column name at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question on whether you are considering deletions in take.
let description_and_addr = dataset | ||
.take_builder(row_ids, projection)? | ||
.with_row_address(true) | ||
.execute() | ||
.await?; | ||
let descriptions = description_and_addr.column(0).as_struct(); | ||
let positions = descriptions.column(0).as_primitive::<UInt64Type>(); | ||
let sizes = descriptions.column(1).as_primitive::<UInt64Type>(); | ||
let row_addrs = description_and_addr.column(1).as_primitive::<UInt64Type>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add an assert that they have the same number of rows? I forget whether take
guarantees that. Consider the case where the row_id has been deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test and added the assert. For now I just raise a NotImplementedError.
The current _take_rows
behavior is to filter out deleted items but I'm not sure that's ideal. It is impossible for the caller to know which IDs were deleted which seems like it would be useful information.
We can either change the behavior to "fill with NULL" or we can make it configurable in a future PR.
26c7fac
to
befe86f
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2983 +/- ##
==========================================
- Coverage 78.25% 78.22% -0.03%
==========================================
Files 238 239 +1
Lines 76234 76594 +360
Branches 76234 76594 +360
==========================================
+ Hits 59657 59917 +260
- Misses 13549 13609 +60
- Partials 3028 3068 +40
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
df9bcb6
to
80086f0
Compare
No description provided.