-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support blob api in pytorch loader #3217
Conversation
d6188fa
to
8e4082a
Compare
@@ -234,6 +243,10 @@ def __init__( | |||
self._to_tensor_fn = to_tensor_fn | |||
self._hf_converter = None | |||
|
|||
self._blob_columns = self._blob_columns() | |||
if self._blob_columns: | |||
self.with_row_id = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need row id to call dataset.take_blobs()
python/python/lance/torch/data.py
Outdated
arr: pa.Array = batch[col] | ||
|
||
if isinstance(arr, list) and arr and isinstance(arr[0], lance.BlobFile): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to check earlier? Like when constructing the loader?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we probably can. One way to do it is passing more parameters, but it makes user-specified to_tensor_fn
more complicated.
Support handling Blob data in PyTorch loader