Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support setting key field in MapBuilder #7101

Merged
merged 8 commits into from
Feb 12, 2025

Conversation

rshkv
Copy link
Contributor

@rshkv rshkv commented Feb 9, 2025

Closes #7100 by allowing overrides of the key field on MapBuilder through a new method called MapBuilder::with_key_field.

The goal of this is to support setting field metadata which, e.g., iceberg-rust would use to assign Iceberg's field ids, see apache/iceberg-rust#863 (comment).

Linking the with_value_field PR for reference: #5483

@github-actions github-actions bot added the arrow Changes to the arrow crate label Feb 9, 2025
/// field's data type does not match that of `K`
pub fn with_keys_field(self, field: impl Into<FieldRef>) -> Self {
let field: FieldRef = field.into();
assert!(!field.is_nullable(), "Key field must not be nullable");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this return Err instead? Given it's user input

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

I'm thinking now this check might be unnecessary because finish asserts here that there are no null-keys anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've removed checking for nullability of the key field. I didn't love that MapBuilder::with_keys_field returns a Result while MapBuilder::with_values_field or GenericListBuilder::with_field do not. We still ensure that we protect against nulls with the existing assertion in finish.

Happy to go back to returning Result. Not feeling strongly about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need this check; the check in finish() only checks for null values, but the datatype of the key field itself should always be non-nullable. An array could have no null values but still a datatype with nullable = true I believe.

I can see how it's awkward for with_keys_field to be the only one returning Result, but it makes sense to me since it has a pre-condition to guard against, and we should take advantage of Rust's type system where possible instead of documenting panics and hoping users read the documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Updated

Copy link
Contributor

@tustvold tustvold Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given a nullable map field is simply out of spec, I personally don't think panicking is wrong and tbh for this sort of invariant it is arguably more correct...

I don't fell strongly but panicking would be more consistent with the rest of the builders

@rshkv rshkv force-pushed the wr/map-builder-with-key-field branch 2 times, most recently from 0cd50a2 to 53bbbfc Compare February 10, 2025 09:45
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor doc nit

/// By default a nullable field is created with the name `values`
/// By default, a nullable field is created with the name `keys`
///
/// Returns an error if the given field is nullable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be updated

/// field's data type does not match that of `K`
pub fn with_keys_field(self, field: impl Into<FieldRef>) -> Self {
let field: FieldRef = field.into();
assert!(!field.is_nullable(), "Keys field must not be nullable");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also move this check into finish, but I don't think it really matters

)
);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for sanity could we get a test that it panics if you try to specify a keys field with the wrong datatype as well

Comment on lines 120 to 123
/// Panics if the given field is nullable as map keys are not allowed to be null
///
/// Note: [`Self::finish`] and [`Self::finish_cloned`] will panic if the
/// field's data type does not match that of `K`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Panics if the given field is nullable as map keys are not allowed to be null
///
/// Note: [`Self::finish`] and [`Self::finish_cloned`] will panic if the
/// field's data type does not match that of `K`
/// Note: [`Self::finish`] and [`Self::finish_cloned`] will panic if the
/// field's data type does not match that of `K` or the field is nullable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, thank you.

@rshkv
Copy link
Contributor Author

rshkv commented Feb 12, 2025

Thank you for catching that

@tustvold tustvold merged commit a85fc03 into apache:main Feb 12, 2025
26 checks passed
@rshkv
Copy link
Contributor Author

rshkv commented Feb 12, 2025

Thank you @tustvold and @Jefffrey for reviewing 🙏🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support creating map arrays with key metadata
3 participants