Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-26664][SQL] Make DecimalType's minimum adjusted scale configurable #23587

Closed
wants to merge 1 commit into from

Conversation

rednaxelafx
Copy link
Contributor

@rednaxelafx rednaxelafx commented Jan 18, 2019

What changes were proposed in this pull request?

Introduce a new conf flag that allows the user to set the value of DecimalType.MINIMAL_ADJUSTED_SCALE, currently a constant of 6, to match their workloads' needs.

The new flag is spark.sql.decimalOperations.minimumAdjustedScale.

#20023 introduced a new conf flag spark.sql.decimalOperations.allowPrecisionLoss to match SQL Server's and new Hive's behavior of allowing precision loss when performing multiplication/division on big and small decimal numbers.
Along with this feature, a constant MINIMAL_ADJUSTED_SCALE was set to 6 for when precision loss is allowed.

Some customer workload may need a larger adjusted scale to match their business needs, and in exchange they may be willing to tolerate some more calculations overflowing the max precision, leading to nulls. So they would like the minimum adjusted scale to be configurable. Thus the need for a new conf.

The default behavior after introducing this conf flag is not changed.

How was this patch tested?

Added a new section in SQL tests to test the behavior of setting minimumAdjustedScale=12.

@SparkQA
Copy link

SparkQA commented Jan 18, 2019

Test build #101408 has finished for PR 23587 at commit 7da0de4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rednaxelafx
Copy link
Contributor Author

This test case failed: org.apache.spark.sql.hive.client.HiveClientSuites.(It is not a test it is a sbt.testing.SuiteSelector)
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101408/testReport/junit/org.apache.spark.sql.hive.client/HiveClientSuites/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/
But it doesn't look related to this PR. Triggering a re-test.

@rednaxelafx
Copy link
Contributor Author

jenkins retest this please

val DECIMAL_OPERATIONS_MINIMUM_ADJUSTED_SCALE =
buildConf("spark.sql.decimalOperations.minimumAdjustedScale")
.internal()
.doc("Decimal operations' minimum adjusted scale when " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we improve this description explaining clearly what this means? A user may be confused if he/she is not familiar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I can probably paraphrase the code comments in

into the conf description here.

Copy link
Contributor Author

@rednaxelafx rednaxelafx Jan 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgaido91 how about this:

Decimal arithmetic operations' minimum adjusted scale in rounding the decimal part of the result. When spark.sql.decimalOperations.allowPrecisionLoss is true and a result precision is greater than MAX_PRECISION (38), the corresponding scale is reduced to prevent the integral part of the result from being truncated. This conf controls how many decimal places to keep at minimum.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still hacky and it does not resolve the root issue. Let us think about it and see whether we have a better solution. I change the target version to 3.0. We will revisit this before the code freeze of 3.0. If we do not have a solution, we can review this again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure which is the problem experienced for which this PR was opened. I think this value can be made a config, as there may be specific use cases where this value may be tuned properly, despite I can't think of specific ones.

it does not resolve the root issue.

What do you mean by "the root issue"? AFAIK, there is only one main issue with decimal operations rigth now, which is a division when a decimal with negative scale is involved. For this problem, #22450 is waiting for reviews, after positive feedback on that approach in the discussion in the mailing list.

@@ -2002,6 +2011,9 @@ class SQLConf extends Serializable with Logging {

def decimalOperationsAllowPrecisionLoss: Boolean = getConf(DECIMAL_OPERATIONS_ALLOW_PREC_LOSS)

def decimalOperationsMinimumAdjustedScale: Int =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember a comment by @rxin about such pattern, which is a bit of overhead/overkill when we use this method only once. So I'd rather revome this and inline it where needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me update it

@@ -153,6 +154,10 @@ object DecimalType extends AbstractDataType {
DecimalType(min(precision, MAX_PRECISION), min(scale, MAX_SCALE))
}

def minimumAdjustedScale: Int = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we can remove the parenthesis...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I went in and updated a few things but all the other functions around this line use braces, I'd like the code style to be consistent at least locally within a file. Is it okay with you if I keep the braces here?

@gatorsmile
Copy link
Member

See my comment. #23587 (comment) Close it first.

@gatorsmile gatorsmile closed this Jan 18, 2019
@SparkQA
Copy link

SparkQA commented Jan 18, 2019

Test build #101410 has finished for PR 23587 at commit 7da0de4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants