Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(java): support alter columns for dataset #3259

Merged
merged 4 commits into from
Dec 20, 2024

Conversation

yanghua
Copy link
Collaborator

@yanghua yanghua commented Dec 17, 2024

No description provided.

@github-actions github-actions bot added enhancement New feature or request java labels Dec 17, 2024
@yanghua yanghua force-pushed the 3249-alter-col branch 5 times, most recently from 312bf18 to ba412eb Compare December 17, 2024 09:13
@yanghua
Copy link
Collaborator Author

yanghua commented Dec 17, 2024

cc @westonpace @wjones127

/** Column alteration used to alter dataset columns. */
public class ColumnAlteration {

private String path;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path was a concept of arrow, should we name it as column name for java moudle?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this suggestion makes sense in some dimension.

But, this issue(the ambiguity of concepts in different contexts) also happens in the rust module. This class just aligns with the definition in the rust module.

I am OK, if we all agree with changing the naming in all modules.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path can be the path to a nested columns. Such as outer_struct.inner_struct.field_a. There the column name is field_a, but the full path to it in the schema is outer_struct.inner_struct.field_a. That's why it's called path.


import org.apache.arrow.vector.types.pojo.ArrowType;

import java.util.Optional;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better use the Optional in java-core module since the jdk Optinal is not seriable. And the spark connector need a seriable class.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better use the Optional in java-core module

What does this mean? Which Optional do you prefer? I see, other core classes for example Fragment, FragmentOperation, WriteParams, ReadOptions also use java.util.Optional.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, maybe the Optional in java should be named as SeriableOptional for spark to use.

let mut dataset_guard =
unsafe { env.get_rust_field::<_, _, BlockingDataset>(java_dataset, NATIVE_DATASET) }?;

RT.block_on(dataset_guard.inner.alter_columns(&column_alterations))?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are non-support alter operators, will this code raise exception?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-support alter operators

What does this mean? Did you mean if there is a dataset's schema does not support evolution on some conversion ?between two types?

If yes, it would throw an exception.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth adding a unit test verifying you get an exception and that it has a meaningful message.

Copy link
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I'll give @SaintBacchus a chance to give a final review before I merge.

@SaintBacchus
Copy link
Collaborator

It also LGTM

@wjones127 wjones127 merged commit 2b29487 into lancedb:main Dec 20, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request java
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants