-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(java): add LanceOperation, commitAppend for batch write #2207
Conversation
java/core/lance-jni/src/ffi.rs
Outdated
@@ -34,12 +37,27 @@ pub trait JNIEnvExt { | |||
/// Get Option<i64> from Java Optional<Long>. | |||
fn get_long_opt(&mut self, obj: &JObject) -> Result<Option<i64>>; | |||
|
|||
/// Get Option<u64> from Java Optional<Long>. | |||
fn get_long_opt_u64(&mut self, obj: &JObject) -> Result<Option<u64>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming: get_u64_opt
,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -54,39 +38,19 @@ public static FragmentMetadata create(String datasetUri, BufferAllocator allocat | |||
|
|||
/** Create a fragment from the given data. */ | |||
public static FragmentMetadata create(String datasetUri, ArrowArrayStream stream, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because java can do override, should we just make two create()
methods with signatures
public static create(String uri, ArrowArrayStream stream, WriteParam params);
public static create(String uri, ArrowArrayStream stream, Integer fragmentId, WriteParam params);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Alluxio, we used to add override methods with different signatures https://github.com/Alluxio/alluxio/blob/36b0b16463b95dc5f88f304028c51749519fa8e9/dora/core/client/fs/src/main/java/alluxio/client/file/FileSystemContext.java#L420
and some class like this one has more than 10 create methods and became really hard to modify on top 😂
params.getMaxRowsPerFile(), params.getMaxRowsPerGroup(), | ||
params.getMaxBytesPerFile(), params.getMode())); | ||
} | ||
|
||
private static native int createWithFfiArray(String datasetUri, | ||
private static native String createWithFfiArray(String datasetUri, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the returned string ? Could you add some comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
public class FragmentMetadata { | ||
private final int fragementId; | ||
private final String jsonMetadata; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we just parse the JSON to actual meaningful fields.
It is easy to operation and validate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep the original json string to be able to pass it back to rust lance.
Yeah, make sense. Original plan to do one to one match and figure out there are many fields in the metadata and many of them are not needed by end-user.
Will take out the needed fields and do the validation for those fields
/** Fragment related operations. */ | ||
public abstract class FragmentOperation { | ||
protected static void validateFragments(List<FragmentMetadata> fragments) { | ||
if (fragments == null || fragments.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Iterator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put the individual fragmentMetadata validation logics into its create function
FragmentMetadata.fromJson(String jsonMetadata)
|
||
@Override | ||
public Dataset commit(BufferAllocator allocator, String path, Optional<Integer> readVersion) { | ||
throw new UnsupportedOperationException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to implement this later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current plan
- Spark batch write + better arrow type & spark type conversion
- Spark streaming write
- may be partition support
May not be able to implement in the short time, so i delete them for now
@eddyxu Addressed the comments, PTAL |
Add LanceOperation, Add commit for Append Operation. Pass Fragment as JSON string between Rust/Java. --------- Co-authored-by: Lei Xu <lei@lancedb.com>
Add LanceOperation, Add commit for Append Operation.
Pass Fragment as JSON string between Rust/Java.