-
Notifications
You must be signed in to change notification settings - Fork 868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Stop losing precision and scale when casting decimal to dictionary #6383
Conversation
@viirya @parthchandra this PR fixes a correctness issue that I ran into when working on dictionary support in Comet. I expect there is a more efficient way to do this though so could use some guidance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are parquet-testing and testing changes intentional?
acae145
to
26095fc
Compare
oops, no .. reverted. Thanks for catching that. |
arrow-cast/src/cast/dictionary.rs
Outdated
// pack_numeric_to_dictionary loses the precision and scale so we have to perform a | ||
// second cast | ||
let decimal_dict_max_precision_scale = pack_numeric_to_dictionary::<K, Decimal128Type>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, looks like pack_numeric_to_dictionary
did casting on dictionary values correctly. I think the issue you encountered is only the incorrect data type (i.e., it uses default precision/scale on dictionary type after casting).
It is because pack_numeric_to_dictionary
takes second type parameter Decimal128Type
and use it as dictionary value type in PrimitiveDictionaryBuilder
. It will then use Decimal128Type.DEFAULT_TYPE
as the value type.
We don't need to do second cast, but just need to change the dictionary value type after casting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to do second cast, but just need to change the dictionary value type after casting.
I'm not sure how to do that. Could you give me some guidance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PrimitiveDictionaryBuilder
used during casting seems to already loose the custom precision and scale. Its new
or with_capacity
methods use the default data type of the generic ArrowPrimitiveType
. Maybe adding a with_capacity_and_data_type
method that calls values_builder.with_data_type
could be a more general fix.
https://github.com/apache/arrow-rs/blob/53.0.0/arrow-array/src/builder/primitive_builder.rs#L152
https://github.com/apache/arrow-rs/blob/53.0.0/arrow-cast/src/cast/dictionary.rs#L299
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to do that. Could you give me some guidance?
Decimal128(p, s) => {
let dict = pack_numeric_to_dictionary::<K, Decimal128Type>(array, dict_value_type, cast_options)?
let dict = dict.as_dictionary::<K>().downcast_dict::<Decimal128Array>().unwrap();
let value = dict.values().clone();
// Set correct precision/scale
let value = value.with_precision_and_scale(p, s)?;
Ok(Arc::new(DictionaryArray::<K>::from((dict.keys().clone(), value)))
}
For Decimal256
, we can do similarly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I have added this
Which issue does this PR close?
Closes #6381
Rationale for this change
Fix a correct issue described in #6381
What changes are included in this PR?
Are there any user-facing changes?
Yes, fixes a correctness issue