[SPARK-26664][SQL] Make DecimalType's minimum adjusted scale configurable #23587

rednaxelafx · 2019-01-18T17:07:25Z

What changes were proposed in this pull request?

Introduce a new conf flag that allows the user to set the value of DecimalType.MINIMAL_ADJUSTED_SCALE, currently a constant of 6, to match their workloads' needs.

The new flag is spark.sql.decimalOperations.minimumAdjustedScale.

#20023 introduced a new conf flag spark.sql.decimalOperations.allowPrecisionLoss to match SQL Server's and new Hive's behavior of allowing precision loss when performing multiplication/division on big and small decimal numbers.
Along with this feature, a constant MINIMAL_ADJUSTED_SCALE was set to 6 for when precision loss is allowed.

Some customer workload may need a larger adjusted scale to match their business needs, and in exchange they may be willing to tolerate some more calculations overflowing the max precision, leading to nulls. So they would like the minimum adjusted scale to be configurable. Thus the need for a new conf.

The default behavior after introducing this conf flag is not changed.

How was this patch tested?

Added a new section in SQL tests to test the behavior of setting minimumAdjustedScale=12.

SparkQA · 2019-01-18T18:41:03Z

Test build #101408 has finished for PR 23587 at commit 7da0de4.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

rednaxelafx · 2019-01-18T18:57:41Z

This test case failed: org.apache.spark.sql.hive.client.HiveClientSuites.(It is not a test it is a sbt.testing.SuiteSelector)
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101408/testReport/junit/org.apache.spark.sql.hive.client/HiveClientSuites/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/
But it doesn't look related to this PR. Triggering a re-test.

rednaxelafx · 2019-01-18T18:57:49Z

jenkins retest this please

mgaido91 · 2019-01-18T19:03:23Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

+  val DECIMAL_OPERATIONS_MINIMUM_ADJUSTED_SCALE =
+    buildConf("spark.sql.decimalOperations.minimumAdjustedScale")
+      .internal()
+      .doc("Decimal operations' minimum adjusted scale when " +


Can we improve this description explaining clearly what this means? A user may be confused if he/she is not familiar.

Yes. I can probably paraphrase the code comments in

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala

Line 156 in 630e25e

/**

into the conf description here.

@mgaido91 how about this:

Decimal arithmetic operations' minimum adjusted scale in rounding the decimal part of the result. When spark.sql.decimalOperations.allowPrecisionLoss is true and a result precision is greater than MAX_PRECISION (38), the corresponding scale is reduced to prevent the integral part of the result from being truncated. This conf controls how many decimal places to keep at minimum.

I think this is still hacky and it does not resolve the root issue. Let us think about it and see whether we have a better solution. I change the target version to 3.0. We will revisit this before the code freeze of 3.0. If we do not have a solution, we can review this again.

I am not sure which is the problem experienced for which this PR was opened. I think this value can be made a config, as there may be specific use cases where this value may be tuned properly, despite I can't think of specific ones.

it does not resolve the root issue.

What do you mean by "the root issue"? AFAIK, there is only one main issue with decimal operations rigth now, which is a division when a decimal with negative scale is involved. For this problem, #22450 is waiting for reviews, after positive feedback on that approach in the discussion in the mailing list.

mgaido91 · 2019-01-18T19:04:48Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

@@ -2002,6 +2011,9 @@ class SQLConf extends Serializable with Logging {

  def decimalOperationsAllowPrecisionLoss: Boolean = getConf(DECIMAL_OPERATIONS_ALLOW_PREC_LOSS)

+  def decimalOperationsMinimumAdjustedScale: Int =


I remember a comment by @rxin about such pattern, which is a bit of overhead/overkill when we use this method only once. So I'd rather revome this and inline it where needed.

Sure, let me update it

mgaido91 · 2019-01-18T19:05:39Z

sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala

@@ -153,6 +154,10 @@ object DecimalType extends AbstractDataType {
    DecimalType(min(precision, MAX_PRECISION), min(scale, MAX_SCALE))
  }

+  def minimumAdjustedScale: Int = {


nit: I think we can remove the parenthesis...

Hmm, I went in and updated a few things but all the other functions around this line use braces, I'd like the code style to be consistent at least locally within a file. Is it okay with you if I keep the braces here?

gatorsmile · 2019-01-18T22:19:45Z

See my comment. #23587 (comment) Close it first.

SparkQA · 2019-01-18T23:07:11Z

Test build #101410 has finished for PR 23587 at commit 7da0de4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

Make DecimalType minimum adjusted scale configurable

7da0de4

mgaido91 reviewed Jan 18, 2019

View reviewed changes

gatorsmile closed this Jan 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-26664][SQL] Make DecimalType's minimum adjusted scale configurable #23587

[SPARK-26664][SQL] Make DecimalType's minimum adjusted scale configurable #23587

rednaxelafx commented Jan 18, 2019 •

edited

Loading

SparkQA commented Jan 18, 2019

rednaxelafx commented Jan 18, 2019

rednaxelafx commented Jan 18, 2019

mgaido91 Jan 18, 2019

rednaxelafx Jan 18, 2019

rednaxelafx Jan 18, 2019 •

edited

Loading

gatorsmile Jan 18, 2019

mgaido91 Jan 18, 2019

mgaido91 Jan 18, 2019

rednaxelafx Jan 18, 2019

mgaido91 Jan 18, 2019

rednaxelafx Jan 18, 2019

rednaxelafx Jan 18, 2019

gatorsmile commented Jan 18, 2019

SparkQA commented Jan 18, 2019

		@@ -2002,6 +2011,9 @@ class SQLConf extends Serializable with Logging {

		def decimalOperationsAllowPrecisionLoss: Boolean = getConf(DECIMAL_OPERATIONS_ALLOW_PREC_LOSS)

		def decimalOperationsMinimumAdjustedScale: Int =

[SPARK-26664][SQL] Make DecimalType's minimum adjusted scale configurable #23587

[SPARK-26664][SQL] Make DecimalType's minimum adjusted scale configurable #23587

Conversation

rednaxelafx commented Jan 18, 2019 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Jan 18, 2019

rednaxelafx commented Jan 18, 2019

rednaxelafx commented Jan 18, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rednaxelafx Jan 18, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gatorsmile commented Jan 18, 2019

SparkQA commented Jan 18, 2019

rednaxelafx commented Jan 18, 2019 •

edited

Loading

rednaxelafx Jan 18, 2019 •

edited

Loading