Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(java): add LanceOperation, commitAppend for batch write #2207

Merged
merged 6 commits into from
Apr 21, 2024

Conversation

LuQQiu
Copy link
Contributor

@LuQQiu LuQQiu commented Apr 16, 2024

Add LanceOperation, Add commit for Append Operation.
Pass Fragment as JSON string between Rust/Java.

@LuQQiu
Copy link
Contributor Author

LuQQiu commented Apr 18, 2024

@eddyxu @beinan PTAL, thanks!

@@ -34,12 +37,27 @@ pub trait JNIEnvExt {
/// Get Option<i64> from Java Optional<Long>.
fn get_long_opt(&mut self, obj: &JObject) -> Result<Option<i64>>;

/// Get Option<u64> from Java Optional<Long>.
fn get_long_opt_u64(&mut self, obj: &JObject) -> Result<Option<u64>>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming: get_u64_opt,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -54,39 +38,19 @@ public static FragmentMetadata create(String datasetUri, BufferAllocator allocat

/** Create a fragment from the given data. */
public static FragmentMetadata create(String datasetUri, ArrowArrayStream stream,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because java can do override, should we just make two create() methods with signatures

public static create(String uri, ArrowArrayStream stream, WriteParam params);
public static create(String uri, ArrowArrayStream stream, Integer fragmentId, WriteParam params);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Alluxio, we used to add override methods with different signatures https://github.com/Alluxio/alluxio/blob/36b0b16463b95dc5f88f304028c51749519fa8e9/dora/core/client/fs/src/main/java/alluxio/client/file/FileSystemContext.java#L420
and some class like this one has more than 10 create methods and became really hard to modify on top 😂

params.getMaxRowsPerFile(), params.getMaxRowsPerGroup(),
params.getMaxBytesPerFile(), params.getMode()));
}

private static native int createWithFfiArray(String datasetUri,
private static native String createWithFfiArray(String datasetUri,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the returned string ? Could you add some comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

public class FragmentMetadata {
private final int fragementId;
private final String jsonMetadata;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we just parse the JSON to actual meaningful fields.
It is easy to operation and validate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the original json string to be able to pass it back to rust lance.
Yeah, make sense. Original plan to do one to one match and figure out there are many fields in the metadata and many of them are not needed by end-user.
Will take out the needed fields and do the validation for those fields

/** Fragment related operations. */
public abstract class FragmentOperation {
protected static void validateFragments(List<FragmentMetadata> fragments) {
if (fragments == null || fragments.isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Iterator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put the individual fragmentMetadata validation logics into its create function
FragmentMetadata.fromJson(String jsonMetadata)


@Override
public Dataset commit(BufferAllocator allocator, String path, Optional<Integer> readVersion) {
throw new UnsupportedOperationException();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to implement this later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current plan

  1. Spark batch write + better arrow type & spark type conversion
  2. Spark streaming write
  3. may be partition support
    May not be able to implement in the short time, so i delete them for now

@LuQQiu
Copy link
Contributor Author

LuQQiu commented Apr 19, 2024

@eddyxu Addressed the comments, PTAL

@LuQQiu LuQQiu merged commit 1b8adb5 into lancedb:main Apr 21, 2024
2 checks passed
eddyxu added a commit that referenced this pull request Apr 22, 2024
Add LanceOperation, Add commit for Append Operation.
Pass Fragment as JSON string between Rust/Java.

---------

Co-authored-by: Lei Xu <lei@lancedb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants