Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-25454][SQL] Avoid precision loss in division with decimal with negative scale #22450

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -129,16 +129,17 @@ object DecimalPrecision extends TypeCoercionRule {
resultType)

case Divide(e1 @ DecimalType.Expression(p1, s1), e2 @ DecimalType.Expression(p2, s2)) =>
val adjP2 = if (s2 < 0) p2 - s2 else p2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule was added long time ago, do you mean this is a long-standing bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this is more clear in the related JIRA description and comments. The problem is that here we have never handled properly decimals with negative scale. The point is: before 2.3, this could happen only if someone was creating some specific literal from a BigDecimal, like lit(BigDecimal(100e6)); since 2.3, this can happen with every constant like 100e6 in the SQL code. So the problem has been there for a while, but we haven't seen it because it was less likely to happen.

Another solution would be avoiding having decimals with a negative scale. But this is quite a breaking change, so I'd avoid until a 3.0 release at least.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i see. Can we add a test in DataFrameSuite with decimal literal?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we update the document of this rule to reflect this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, but if you agree I'll try and find a better place than DataFrameSuite. I'd prefer adding the new tests to ArithmeticExpressionSuite. Is that ok for you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

val resultType = if (SQLConf.get.decimalOperationsAllowPrecisionLoss) {
// Precision: p1 - s1 + s2 + max(6, s1 + p2 + 1)
// Scale: max(6, s1 + p2 + 1)
val intDig = p1 - s1 + s2
val scale = max(DecimalType.MINIMUM_ADJUSTED_SCALE, s1 + p2 + 1)
val scale = max(DecimalType.MINIMUM_ADJUSTED_SCALE, s1 + adjP2 + 1)
val prec = intDig + scale
DecimalType.adjustPrecisionScale(prec, scale)
} else {
var intDig = min(DecimalType.MAX_SCALE, p1 - s1 + s2)
var decDig = min(DecimalType.MAX_SCALE, max(6, s1 + p2 + 1))
var decDig = min(DecimalType.MAX_SCALE, max(6, s1 + adjP2 + 1))
val diff = (intDig + decDig) - DecimalType.MAX_SCALE
if (diff > 0) {
decDig -= diff / 2 + 1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -276,9 +276,11 @@ class DecimalPrecisionSuite extends AnalysisTest with BeforeAndAfter {
val a = AttributeReference("a", DecimalType(3, -10))()
val b = AttributeReference("b", DecimalType(1, -1))()
val c = AttributeReference("c", DecimalType(35, 1))()
val nonNegative = AttributeReference("nn", DecimalType(11, 0))()
checkType(Multiply(a, b), DecimalType(5, -11))
checkType(Multiply(a, c), DecimalType(38, -9))
checkType(Multiply(b, c), DecimalType(37, 0))
checkType(Divide(nonNegative, a), DecimalType(15, 14))
}

/** strength reduction for integer/decimal comparisons */
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -83,4 +83,7 @@ select 12345678912345678912345678912.1234567 + 9999999999999999999999999999999.1
select 123456789123456789.1234567890 * 1.123456789123456789;
select 12345678912345.123456789123 / 0.000000012345678;

-- division with negative scale operands
select 26393499451 / 1000e6;

drop table decimals_test;
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
-- Automatically generated by SQLQueryTestSuite
-- Number of queries: 40
-- Number of queries: 41


-- !query 0
Expand Down Expand Up @@ -328,8 +328,16 @@ NULL


-- !query 39
drop table decimals_test
select 26393499451 / 1000e6
-- !query 39 schema
struct<>
struct<(CAST(CAST(26393499451 AS DECIMAL(11,0)) AS DECIMAL(11,0)) / CAST(1.000E+9 AS DECIMAL(11,0))):decimal(16,11)>
-- !query 39 output
26.393499451


-- !query 40
drop table decimals_test
-- !query 40 schema
struct<>
-- !query 40 output