Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: delimited format should write decimals in a format it can read #6238

Merged
merged 8 commits into from
Sep 22, 2020

Conversation

big-andy-coates
Copy link
Contributor

Description

fixes: #6234

Note: stacked on top of #6233 - ignore the first commit.

BREAKING CHANGE
This changes the format the DELIMITED form writes decimals from 12,345,678,987,654,321.12345 to 12345678987654321.12345

Testing done

usual

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

fixes: confluentinc#6233

Fixes the scale of the decimals values returned by the variant of the `ROUND` method that takes a decimal and a required number of decimal places to round to.

This fixes a bug where the output could not be serialized to Avro.
fixes: confluentinc#6234

BREAKING CHANGE
This changes the format the `DELIMITED` form writes decimals from `12,345,678,987,654,321.12345` to `12345678987654321.12345`
@big-andy-coates big-andy-coates requested a review from a team as a code owner September 16, 2020 18:44
final int precision = DecimalUtil.precision(field.schema());
final int scale = DecimalUtil.scale(field.schema());

return DecimalUtil.format(precision, scale, value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this format used to add trailing zeros to decimals? Without this, we loose those zeros.
Can the DecimalFormat be configured to avoid adding commas?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look - thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked into this and the trailing zeros are correctly formatted. However, we do need to call toPlanString(), rather than toString() if we want to avoid scientific notation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this, is there any reason to prefer plain notation over scientific notation? the plan notation is lossy while scientific notation conserves the precision/scale exactly (docs about toPlainString):

Note that if the result of this method is passed to the string constructor, only the numerical 
value of this BigDecimal will necessarily be recovered; the representation of the new BigDecimal 
may have a different scale.

Compare with toString:

There is a one-to-one mapping between the distinguishable BigDecimal values and the 
result of this conversion.

Copy link
Contributor Author

@big-andy-coates big-andy-coates Sep 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thought process behind this is that:
a) The serializer previously didn't use scientific notation, so this would be a change in behaviour beyond the bug fix to allow it to read its own decimals.
b) We've had users be confused about decimals in scientific notation before, in JSON I think it was, and switched that serializer to non-scientific.

I agree that 1.23e+3 more accurately represent the scale of the value than 1230. However, our delimited deserializer will correct deserialize the latter correctly, given the correct schema.

Ergo, I think this is the best default behaviour from a useability and minimising support perspective. However, I can see the benefit of supporting scientific notation, (maybe through a config setting), for those that want it. Though I'd say this is out of scope of this bug fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I don't feel to strongly about it either way so as long as we've given it thought I'm happy here

@big-andy-coates big-andy-coates requested review from spena and a team September 21, 2020 10:54
Copy link
Contributor

@agavra agavra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though if you see the inline comment i think we may want to consider using scientific notation in delimited format. (Though tbh, I doubt people are using decimals in delimited if they actually care about precision/scale so it might be bikeshedding)

final int precision = DecimalUtil.precision(field.schema());
final int scale = DecimalUtil.scale(field.schema());

return DecimalUtil.format(precision, scale, value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this, is there any reason to prefer plain notation over scientific notation? the plan notation is lossy while scientific notation conserves the precision/scale exactly (docs about toPlainString):

Note that if the result of this method is passed to the string constructor, only the numerical 
value of this BigDecimal will necessarily be recovered; the representation of the new BigDecimal 
may have a different scale.

Compare with toString:

There is a one-to-one mapping between the distinguishable BigDecimal values and the 
result of this conversion.

@big-andy-coates big-andy-coates merged commit 626965e into confluentinc:master Sep 22, 2020
@big-andy-coates big-andy-coates deleted the invalid_decimal branch September 22, 2020 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DELIMITED format writes decimals in a format that it can't read
3 participants