-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(java): support alter columns for dataset #3259
Conversation
312bf18
to
ba412eb
Compare
27b05ae
to
97c6240
Compare
3f54adb
to
8f8936b
Compare
/** Column alteration used to alter dataset columns. */ | ||
public class ColumnAlteration { | ||
|
||
private String path; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
path
was a concept of arrow, should we name it as column name
for java moudle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, this suggestion makes sense in some dimension.
But, this issue(the ambiguity of concepts in different contexts) also happens in the rust module. This class just aligns with the definition in the rust module.
I am OK, if we all agree with changing the naming in all modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
path
can be the path to a nested columns. Such as outer_struct.inner_struct.field_a
. There the column name is field_a
, but the full path to it in the schema is outer_struct.inner_struct.field_a
. That's why it's called path
.
|
||
import org.apache.arrow.vector.types.pojo.ArrowType; | ||
|
||
import java.util.Optional; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better use the Optional
in java-core module since the jdk Optinal
is not seriable. And the spark connector need a seriable class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better use the Optional in java-core module
What does this mean? Which Optional
do you prefer? I see, other core classes for example Fragment
, FragmentOperation
, WriteParams
, ReadOptions
also use java.util.Optional
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, maybe the Optional
in java should be named as SeriableOptional
for spark to use.
let mut dataset_guard = | ||
unsafe { env.get_rust_field::<_, _, BlockingDataset>(java_dataset, NATIVE_DATASET) }?; | ||
|
||
RT.block_on(dataset_guard.inner.alter_columns(&column_alterations))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are non-support alter operators, will this code raise exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-support alter operators
What does this mean? Did you mean if there is a dataset's schema does not support evolution on some conversion ?between two types?
If yes, it would throw an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth adding a unit test verifying you get an exception and that it has a meaningful message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I'll give @SaintBacchus a chance to give a final review before I merge.
It also LGTM |
No description provided.