-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: improvement of Ray sink API #2237
Conversation
Add LanceOperation, Add commit for Append Operation. Pass Fragment as JSON string between Rust/Java. --------- Co-authored-by: Lei Xu <lei@lancedb.com>
9e98ecb
to
c9a4130
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small issues with signatures, but otherwise looks good.
max_bytes_per_file = ( | ||
DEFAULT_MAX_BYTES_PER_FILE if max_bytes_per_file is None else max_bytes_per_file | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just put DEFAULT_MAX_BYTES_PER_FILE
in the signature as the default value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So using None here we can later detect whether users specify this value or not. During a benchmark, the default 90GB causes OOM (now we know it was a bug in arrow. This allows us to provide a better value later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Seems like a good thing to write a TODO
comment for.
python/python/lance/ray/sink.py
Outdated
""" | ||
|
||
def __init__( | ||
self, | ||
uri: str, | ||
*, | ||
transform: Callable[[pa.Table], Union[pa.Table, Generator]] = lambda x: x, | ||
transform: Callable[[pa.Table], Union[pa.Table, Generator]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're going to make the default None
, then you have to change the type accordingly:
transform: Callable[[pa.Table], Union[pa.Table, Generator]] = None, | |
transform: Optional[Callable[[pa.Table], Union[pa.Table, Generator]]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
python/python/lance/ray/sink.py
Outdated
max_rows_per_file : int, optional | ||
The maximum number of rows per file. Default is 1024 * 1024. | ||
max_bytes_per_file : int, optional | ||
The maximum number of bytes per file. Default is None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default is 90GB, not None, right?
The maximum number of bytes per file. Default is None. | |
The maximum number of bytes per file. Default is 90GB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
python/python/lance/ray/sink.py
Outdated
max_rows_per_file: int, optional | ||
The maximum number of rows per file. Default is 1024 * 1024. | ||
max_bytes_per_file: int, optional | ||
The maximum number of bytes per file. Default is None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
The maximum number of bytes per file. Default is None. | |
The maximum number of bytes per file. Default is 90GB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
max_bytes_per_file
via Ray sinkray.data.Dataset.write_lance()
interface.