-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-26664][SQL] Make DecimalType's minimum adjusted scale configurable #23587
Conversation
Test build #101408 has finished for PR 23587 at commit
|
This test case failed: org.apache.spark.sql.hive.client.HiveClientSuites.(It is not a test it is a sbt.testing.SuiteSelector) |
jenkins retest this please |
val DECIMAL_OPERATIONS_MINIMUM_ADJUSTED_SCALE = | ||
buildConf("spark.sql.decimalOperations.minimumAdjustedScale") | ||
.internal() | ||
.doc("Decimal operations' minimum adjusted scale when " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we improve this description explaining clearly what this means? A user may be confused if he/she is not familiar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I can probably paraphrase the code comments in
into the conf description here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mgaido91 how about this:
Decimal arithmetic operations' minimum adjusted scale in rounding the decimal part of the result. When spark.sql.decimalOperations.allowPrecisionLoss is true and a result precision is greater than MAX_PRECISION (38), the corresponding scale is reduced to prevent the integral part of the result from being truncated. This conf controls how many decimal places to keep at minimum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is still hacky and it does not resolve the root issue. Let us think about it and see whether we have a better solution. I change the target version to 3.0. We will revisit this before the code freeze of 3.0. If we do not have a solution, we can review this again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure which is the problem experienced for which this PR was opened. I think this value can be made a config, as there may be specific use cases where this value may be tuned properly, despite I can't think of specific ones.
it does not resolve the root issue.
What do you mean by "the root issue"? AFAIK, there is only one main issue with decimal operations rigth now, which is a division when a decimal with negative scale is involved. For this problem, #22450 is waiting for reviews, after positive feedback on that approach in the discussion in the mailing list.
@@ -2002,6 +2011,9 @@ class SQLConf extends Serializable with Logging { | |||
|
|||
def decimalOperationsAllowPrecisionLoss: Boolean = getConf(DECIMAL_OPERATIONS_ALLOW_PREC_LOSS) | |||
|
|||
def decimalOperationsMinimumAdjustedScale: Int = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember a comment by @rxin about such pattern, which is a bit of overhead/overkill when we use this method only once. So I'd rather revome this and inline it where needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me update it
@@ -153,6 +154,10 @@ object DecimalType extends AbstractDataType { | |||
DecimalType(min(precision, MAX_PRECISION), min(scale, MAX_SCALE)) | |||
} | |||
|
|||
def minimumAdjustedScale: Int = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think we can remove the parenthesis...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I went in and updated a few things but all the other functions around this line use braces, I'd like the code style to be consistent at least locally within a file. Is it okay with you if I keep the braces here?
See my comment. #23587 (comment) Close it first. |
Test build #101410 has finished for PR 23587 at commit
|
What changes were proposed in this pull request?
Introduce a new conf flag that allows the user to set the value of
DecimalType.MINIMAL_ADJUSTED_SCALE
, currently a constant of 6, to match their workloads' needs.The new flag is
spark.sql.decimalOperations.minimumAdjustedScale
.#20023 introduced a new conf flag
spark.sql.decimalOperations.allowPrecisionLoss
to match SQL Server's and new Hive's behavior of allowing precision loss when performing multiplication/division on big and small decimal numbers.Along with this feature, a constant
MINIMAL_ADJUSTED_SCALE
was set to 6 for when precision loss is allowed.Some customer workload may need a larger adjusted scale to match their business needs, and in exchange they may be willing to tolerate some more calculations overflowing the max precision, leading to nulls. So they would like the minimum adjusted scale to be configurable. Thus the need for a new conf.
The default behavior after introducing this conf flag is not changed.
How was this patch tested?
Added a new section in SQL tests to test the behavior of setting minimumAdjustedScale=12.