-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: delimited format should write decimals in a format it can read #6238
fix: delimited format should write decimals in a format it can read #6238
Conversation
fixes: confluentinc#6233 Fixes the scale of the decimals values returned by the variant of the `ROUND` method that takes a decimal and a required number of decimal places to round to. This fixes a bug where the output could not be serialized to Avro.
fixes: confluentinc#6234 BREAKING CHANGE This changes the format the `DELIMITED` form writes decimals from `12,345,678,987,654,321.12345` to `12345678987654321.12345`
ksqldb-serde/src/main/java/io/confluent/ksql/serde/delimited/KsqlDelimitedSerializer.java
Outdated
Show resolved
Hide resolved
final int precision = DecimalUtil.precision(field.schema()); | ||
final int scale = DecimalUtil.scale(field.schema()); | ||
|
||
return DecimalUtil.format(precision, scale, value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this format
used to add trailing zeros to decimals? Without this, we loose those zeros.
Can the DecimalFormat
be configured to avoid adding commas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look - thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked into this and the trailing zeros are correctly formatted. However, we do need to call toPlanString()
, rather than toString()
if we want to avoid scientific notation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this, is there any reason to prefer plain notation over scientific notation? the plan notation is lossy while scientific notation conserves the precision/scale exactly (docs about toPlainString
):
Note that if the result of this method is passed to the string constructor, only the numerical
value of this BigDecimal will necessarily be recovered; the representation of the new BigDecimal
may have a different scale.
Compare with toString
:
There is a one-to-one mapping between the distinguishable BigDecimal values and the
result of this conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thought process behind this is that:
a) The serializer previously didn't use scientific notation, so this would be a change in behaviour beyond the bug fix to allow it to read its own decimals.
b) We've had users be confused about decimals in scientific notation before, in JSON I think it was, and switched that serializer to non-scientific.
I agree that 1.23e+3
more accurately represent the scale of the value than 1230
. However, our delimited deserializer will correct deserialize the latter correctly, given the correct schema.
Ergo, I think this is the best default behaviour from a useability and minimising support perspective. However, I can see the benefit of supporting scientific notation, (maybe through a config setting), for those that want it. Though I'd say this is out of scope of this bug fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I don't feel to strongly about it either way so as long as we've given it thought I'm happy here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, though if you see the inline comment i think we may want to consider using scientific notation in delimited format. (Though tbh, I doubt people are using decimals in delimited if they actually care about precision/scale so it might be bikeshedding)
final int precision = DecimalUtil.precision(field.schema()); | ||
final int scale = DecimalUtil.scale(field.schema()); | ||
|
||
return DecimalUtil.format(precision, scale, value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this, is there any reason to prefer plain notation over scientific notation? the plan notation is lossy while scientific notation conserves the precision/scale exactly (docs about toPlainString
):
Note that if the result of this method is passed to the string constructor, only the numerical
value of this BigDecimal will necessarily be recovered; the representation of the new BigDecimal
may have a different scale.
Compare with toString
:
There is a one-to-one mapping between the distinguishable BigDecimal values and the
result of this conversion.
Description
fixes: #6234
Note: stacked on top of #6233 - ignore the first commit.
BREAKING CHANGE
This changes the format the
DELIMITED
form writes decimals from12,345,678,987,654,321.12345
to12345678987654321.12345
Testing done
usual
Reviewer checklist