Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-47781][SQL] Handle negative scale decimals for JDBC data sources #45956

Closed
wants to merge 3 commits into from

Conversation

yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Apr 9, 2024

What changes were proposed in this pull request?

SPARK-30252 has disabled the definition of the negative scale for decimals. It has a regression that also disabled reading negative scale decimals from JDBC data sources. Although there is a legacy config to restore the old behavior, it seemed neither designed for such a case nor convenient in a data pipeline that extracts negative scale decimals from a database such as Oracle to Parquet files w/o negative scale decimal support.

In addition, Postgres has the negative scale decimals support since v15, which was one of the supporters for disabling negative scale decimals on our side.

In this PR, we change the schema from decimal(p,s) to decimal(p-s,0) if s<0.

Why are the changes needed?

Does this PR introduce any user-facing change?

Negative scale decimals have many supporters for rounding the internal parts, such as Oracle, Postgres, etc.

  • Oracle

Negative scale is the number of significant digits to the left of the decimal point, to but not including the least significant digit. For negative scale the least significant digit is on the left side of the decimal point, because the actual data is rounded to the specified number of places to the left of the decimal point. For example, a specification of (10,-2) means to round to hundreds.

  • Postgres

Beginning in PostgreSQL 15, it is allowed to declare a numeric column with a negative scale. Then values will be rounded to the left of the decimal point. The precision still represents the maximum number of non-rounded digits. Thus, a column declared as
NUMERIC(2, -3)
will round values to the nearest thousand and can store values between -99000 and 99000, inclusive. It is also allowed to declare a scale larger than the declared precision. Such a column can only hold fractional values, and it requires the number of zero digits just to the right of the decimal point to be at least the declared scale minus the declared precision. For example, a column declared as

How was this patch tested?

new tests

Was this patch authored or co-authored using generative AI tooling?

no

@github-actions github-actions bot added the SQL label Apr 9, 2024
@yaooqinn
Copy link
Member Author

yaooqinn commented Apr 9, 2024

cc @cloud-fan

@yaooqinn yaooqinn changed the title [SPARK-47781][SQL] Handle negative scale and truncate exceed scale first for JDBC data sources [SPARK-47781][SQL] Handle negative scale for JDBC data sources Apr 10, 2024
@yaooqinn
Copy link
Member Author

cc @dongjoon-hyun @cloud-fan, PTAL when you have some time

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM

@yaooqinn yaooqinn changed the title [SPARK-47781][SQL] Handle negative scale for JDBC data sources [SPARK-47781][SQL] Handle negative scale decimals for JDBC data sources Apr 10, 2024
@yaooqinn yaooqinn closed this in b53ec00 Apr 10, 2024
@yaooqinn yaooqinn deleted the SPARK-47781 branch April 10, 2024 05:27
@yaooqinn
Copy link
Member Author

yaooqinn commented Apr 10, 2024

Merged to master

Thank you. @cloud-fan @dongjoon-hyun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants