-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-22036][SQL] Decimal multiplication with high precision/scale often returns NULL #20023
Changes from 4 commits
3037d4a
6701a54
20616fd
ecd6a2a
cb62433
77e445f
519571d
1f36cf6
1ff819d
c1c5781
3e79a07
090659f
cf3b372
03644fe
7653e6d
2b66098
b4b0350
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,6 +21,7 @@ import org.apache.spark.sql.catalyst.expressions._ | |
import org.apache.spark.sql.catalyst.expressions.Literal._ | ||
import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan | ||
import org.apache.spark.sql.catalyst.rules.Rule | ||
import org.apache.spark.sql.internal.SQLConf | ||
import org.apache.spark.sql.types._ | ||
|
||
|
||
|
@@ -42,8 +43,10 @@ import org.apache.spark.sql.types._ | |
* e1 / e2 p1 - s1 + s2 + max(6, s1 + p2 + 1) max(6, s1 + p2 + 1) | ||
* e1 % e2 min(p1-s1, p2-s2) + max(s1, s2) max(s1, s2) | ||
* e1 union e2 max(s1, s2) + max(p1-s1, p2-s2) max(s1, s2) | ||
* sum(e1) p1 + 10 s1 | ||
* avg(e1) p1 + 4 s1 + 4 | ||
* | ||
* When `spark.sql.decimalOperations.allowTruncat` is set to true, if the precision / scale needed | ||
* are out of the range of available values, the scale is reduced up to 6, in order to prevent the | ||
* truncation of the integer part of the decimals. | ||
* | ||
* To implement the rules for fixed-precision types, we introduce casts to turn them to unlimited | ||
* precision, do the math on unlimited-precision numbers, then introduce casts back to the | ||
|
@@ -56,6 +59,7 @@ import org.apache.spark.sql.types._ | |
* - INT gets turned into DECIMAL(10, 0) | ||
* - LONG gets turned into DECIMAL(20, 0) | ||
* - FLOAT and DOUBLE cause fixed-length decimals to turn into DOUBLE | ||
* - Literals INT and LONG gets turned into DECIMAL with the precision strictly needed by the value | ||
*/ | ||
// scalastyle:on | ||
object DecimalPrecision extends TypeCoercionRule { | ||
|
@@ -93,41 +97,76 @@ object DecimalPrecision extends TypeCoercionRule { | |
case e: BinaryArithmetic if e.left.isInstanceOf[PromotePrecision] => e | ||
|
||
case Add(e1 @ DecimalType.Expression(p1, s1), e2 @ DecimalType.Expression(p2, s2)) => | ||
val dt = DecimalType.bounded(max(s1, s2) + max(p1 - s1, p2 - s2) + 1, max(s1, s2)) | ||
CheckOverflow(Add(promotePrecision(e1, dt), promotePrecision(e2, dt)), dt) | ||
val resultType = if (SQLConf.get.decimalOperationsAllowTruncat) { | ||
val resultScale = max(s1, s2) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we can put this before the |
||
DecimalType.adjustPrecisionScale(max(p1 - s1, p2 - s2) + resultScale + 1, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is an example. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, that may be a difference indeed. But I think it is a minor one, since 99% of the cases the precision is exceeded only in multiplications and divisions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to make a decision. You know, we try our best to keep our rule as stable as possible. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm ok to do the adjustment for all operations, which is same as Hive. |
||
resultScale) | ||
} else { | ||
DecimalType.bounded(max(s1, s2) + max(p1 - s1, p2 - s2) + 1, max(s1, s2)) | ||
} | ||
CheckOverflow(Add(promotePrecision(e1, resultType), promotePrecision(e2, resultType)), | ||
resultType) | ||
|
||
case Subtract(e1 @ DecimalType.Expression(p1, s1), e2 @ DecimalType.Expression(p2, s2)) => | ||
val dt = DecimalType.bounded(max(s1, s2) + max(p1 - s1, p2 - s2) + 1, max(s1, s2)) | ||
CheckOverflow(Subtract(promotePrecision(e1, dt), promotePrecision(e2, dt)), dt) | ||
val resultType = if (SQLConf.get.decimalOperationsAllowTruncat) { | ||
val resultScale = max(s1, s2) | ||
DecimalType.adjustPrecisionScale(max(p1 - s1, p2 - s2) + resultScale + 1, | ||
resultScale) | ||
} else { | ||
DecimalType.bounded(max(s1, s2) + max(p1 - s1, p2 - s2) + 1, max(s1, s2)) | ||
} | ||
CheckOverflow(Subtract(promotePrecision(e1, resultType), promotePrecision(e2, resultType)), | ||
resultType) | ||
|
||
case Multiply(e1 @ DecimalType.Expression(p1, s1), e2 @ DecimalType.Expression(p2, s2)) => | ||
val resultType = DecimalType.bounded(p1 + p2 + 1, s1 + s2) | ||
val resultType = if (SQLConf.get.decimalOperationsAllowTruncat) { | ||
DecimalType.adjustPrecisionScale(p1 + p2 + 1, s1 + s2) | ||
} else { | ||
DecimalType.bounded(p1 + p2 + 1, s1 + s2) | ||
} | ||
val widerType = widerDecimalType(p1, s1, p2, s2) | ||
CheckOverflow(Multiply(promotePrecision(e1, widerType), promotePrecision(e2, widerType)), | ||
resultType) | ||
|
||
case Divide(e1 @ DecimalType.Expression(p1, s1), e2 @ DecimalType.Expression(p2, s2)) => | ||
var intDig = min(DecimalType.MAX_SCALE, p1 - s1 + s2) | ||
var decDig = min(DecimalType.MAX_SCALE, max(6, s1 + p2 + 1)) | ||
val diff = (intDig + decDig) - DecimalType.MAX_SCALE | ||
if (diff > 0) { | ||
decDig -= diff / 2 + 1 | ||
intDig = DecimalType.MAX_SCALE - decDig | ||
val resultType = if (SQLConf.get.decimalOperationsAllowTruncat) { | ||
// Precision: p1 - s1 + s2 + max(6, s1 + p2 + 1) | ||
// Scale: max(6, s1 + p2 + 1) | ||
val intDig = p1 - s1 + s2 | ||
val scale = max(DecimalType.MINIMUM_ADJUSTED_SCALE, s1 + p2 + 1) | ||
val prec = intDig + scale | ||
DecimalType.adjustPrecisionScale(prec, scale) | ||
} else { | ||
var intDig = min(DecimalType.MAX_SCALE, p1 - s1 + s2) | ||
var decDig = min(DecimalType.MAX_SCALE, max(6, s1 + p2 + 1)) | ||
val diff = (intDig + decDig) - DecimalType.MAX_SCALE | ||
if (diff > 0) { | ||
decDig -= diff / 2 + 1 | ||
intDig = DecimalType.MAX_SCALE - decDig | ||
} | ||
DecimalType.bounded(intDig + decDig, decDig) | ||
} | ||
val resultType = DecimalType.bounded(intDig + decDig, decDig) | ||
val widerType = widerDecimalType(p1, s1, p2, s2) | ||
CheckOverflow(Divide(promotePrecision(e1, widerType), promotePrecision(e2, widerType)), | ||
resultType) | ||
|
||
case Remainder(e1 @ DecimalType.Expression(p1, s1), e2 @ DecimalType.Expression(p2, s2)) => | ||
val resultType = DecimalType.bounded(min(p1 - s1, p2 - s2) + max(s1, s2), max(s1, s2)) | ||
val resultType = if (SQLConf.get.decimalOperationsAllowTruncat) { | ||
DecimalType.adjustPrecisionScale(min(p1 - s1, p2 - s2) + max(s1, s2), max(s1, s2)) | ||
} else { | ||
DecimalType.bounded(min(p1 - s1, p2 - s2) + max(s1, s2), max(s1, s2)) | ||
} | ||
// resultType may have lower precision, so we cast them into wider type first. | ||
val widerType = widerDecimalType(p1, s1, p2, s2) | ||
CheckOverflow(Remainder(promotePrecision(e1, widerType), promotePrecision(e2, widerType)), | ||
resultType) | ||
|
||
case Pmod(e1 @ DecimalType.Expression(p1, s1), e2 @ DecimalType.Expression(p2, s2)) => | ||
val resultType = DecimalType.bounded(min(p1 - s1, p2 - s2) + max(s1, s2), max(s1, s2)) | ||
val resultType = if (SQLConf.get.decimalOperationsAllowTruncat) { | ||
DecimalType.adjustPrecisionScale(min(p1 - s1, p2 - s2) + max(s1, s2), max(s1, s2)) | ||
} else { | ||
DecimalType.bounded(min(p1 - s1, p2 - s2) + max(s1, s2), max(s1, s2)) | ||
} | ||
// resultType may have lower precision, so we cast them into wider type first. | ||
val widerType = widerDecimalType(p1, s1, p2, s2) | ||
CheckOverflow(Pmod(promotePrecision(e1, widerType), promotePrecision(e2, widerType)), | ||
|
@@ -137,9 +176,6 @@ object DecimalPrecision extends TypeCoercionRule { | |
e2 @ DecimalType.Expression(p2, s2)) if p1 != p2 || s1 != s2 => | ||
val resultType = widerDecimalType(p1, s1, p2, s2) | ||
b.makeCopy(Array(Cast(e1, resultType), Cast(e2, resultType))) | ||
|
||
// TODO: MaxOf, MinOf, etc might want other rules | ||
// SUM and AVERAGE are handled by the implementations of those expressions | ||
} | ||
|
||
/** | ||
|
@@ -243,17 +279,43 @@ object DecimalPrecision extends TypeCoercionRule { | |
// Promote integers inside a binary expression with fixed-precision decimals to decimals, | ||
// and fixed-precision decimals in an expression with floats / doubles to doubles | ||
case b @ BinaryOperator(left, right) if left.dataType != right.dataType => | ||
(left.dataType, right.dataType) match { | ||
case (t: IntegralType, DecimalType.Fixed(p, s)) => | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: I feel it's more readable to just put the new cases for literal before these 4 cases. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. unfortunately this is not really feasible since we match on different thigs: here we match on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we can do
|
||
b.makeCopy(Array(Cast(left, DecimalType.forType(t)), right)) | ||
case (DecimalType.Fixed(p, s), t: IntegralType) => | ||
b.makeCopy(Array(left, Cast(right, DecimalType.forType(t)))) | ||
case (t, DecimalType.Fixed(p, s)) if isFloat(t) => | ||
b.makeCopy(Array(left, Cast(right, DoubleType))) | ||
case (DecimalType.Fixed(p, s), t) if isFloat(t) => | ||
b.makeCopy(Array(Cast(left, DoubleType), right)) | ||
case _ => | ||
b | ||
} | ||
nondecimalLiteralAndDecimal(b).lift((left, right)).getOrElse( | ||
nondecimalNonliteralAndDecimal(b).applyOrElse((left.dataType, right.dataType), | ||
(_: (DataType, DataType)) => b)) | ||
} | ||
|
||
/** | ||
* Type coercion for BinaryOperator in which one side is a non-decimal literal numeric, and the | ||
* other side is a decimal. | ||
*/ | ||
private def nondecimalLiteralAndDecimal( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this rule newly introduced? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, it is. If we don't introduce this, we have a failure in Hive compatibility tests, because Hive use the exact precision and scale needed by the literals, while we, before this change, were using conservative values for each type. For instance, if we have a |
||
b: BinaryOperator): PartialFunction[(Expression, Expression), Expression] = { | ||
// Promote literal integers inside a binary expression with fixed-precision decimals to | ||
// decimals. The precision and scale are the ones needed by the integer value. | ||
case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] | ||
&& l.dataType.isInstanceOf[IntegralType] => | ||
b.makeCopy(Array(Cast(l, DecimalType.forLiteral(l)), r)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What if we don't do this? Requiring more precision seems OK as now we allow precision lose. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if we don't do this we have many test failure in spark-hive, because hive does so. Moreover, requiring more precision is not OK, since it leads to a useless loss of precision. Think of this example: you multiply a column which is DECIMAL(38, 18) by There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. makes sense There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add this example as the code comment? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hive is also doing this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, Hive is doing so. That is the reason why I introduced the change (without it, we would have had test failures in spark hive). I will add this in the comment. |
||
case (l, r: Literal) if l.dataType.isInstanceOf[DecimalType] | ||
&& r.dataType.isInstanceOf[IntegralType] => | ||
b.makeCopy(Array(l, Cast(r, DecimalType.forLiteral(r)))) | ||
} | ||
|
||
/** | ||
* Type coercion for BinaryOperator in which one side is a non-decimal non-literal numeric, and | ||
* the other side is a decimal. | ||
*/ | ||
private def nondecimalNonliteralAndDecimal( | ||
b: BinaryOperator): PartialFunction[(DataType, DataType), Expression] = { | ||
// Promote integers inside a binary expression with fixed-precision decimals to decimals, | ||
// and fixed-precision decimals in an expression with floats / doubles to doubles | ||
case (t: IntegralType, DecimalType.Fixed(p, s)) => | ||
b.makeCopy(Array(Cast(b.left, DecimalType.forType(t)), b.right)) | ||
case (DecimalType.Fixed(_, _), t: IntegralType) => | ||
b.makeCopy(Array(b.left, Cast(b.right, DecimalType.forType(t)))) | ||
case (t, DecimalType.Fixed(_, _)) if isFloat(t) => | ||
b.makeCopy(Array(b.left, Cast(b.right, DoubleType))) | ||
case (DecimalType.Fixed(_, _), t) if isFloat(t) => | ||
b.makeCopy(Array(Cast(b.left, DoubleType), b.right)) | ||
} | ||
|
||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1048,6 +1048,16 @@ object SQLConf { | |
.booleanConf | ||
.createWithDefault(true) | ||
|
||
val DECIMAL_OPERATIONS_ALLOW_TRUNCAT = | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. typo? |
||
buildConf("spark.sql.decimalOperations.allowTruncat") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks. This is what @cloud-fan suggested. Actually we are not truncating, but rounding. Personally, I'd prefer There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry that was my typo... |
||
.internal() | ||
.doc("When true, establishing the result type of an arithmetic operation happens " + | ||
"according to Hive behavior and SQL ANSI 2011 specification, ie. rounding the decimal " + | ||
"part of the result if an exact representation is not possible. Otherwise, NULL is" + | ||
"returned in those cases, as previously (default).") | ||
.booleanConf | ||
.createWithDefault(false) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should make it true by default as it's a more reasonable behavior and follows Hive/SQL standard. |
||
|
||
val SQL_STRING_REDACTION_PATTERN = | ||
ConfigBuilder("spark.sql.redaction.string.regex") | ||
.doc("Regex to decide which parts of strings produced by Spark contain sensitive " + | ||
|
@@ -1423,6 +1433,8 @@ class SQLConf extends Serializable with Logging { | |
|
||
def replaceExceptWithFilter: Boolean = getConf(REPLACE_EXCEPT_WITH_FILTER) | ||
|
||
def decimalOperationsAllowTruncat: Boolean = getConf(DECIMAL_OPERATIONS_ALLOW_TRUNCAT) | ||
|
||
def continuousStreamingExecutorQueueSize: Int = getConf(CONTINUOUS_STREAMING_EXECUTOR_QUEUE_SIZE) | ||
|
||
def continuousStreamingExecutorPollIntervalMs: Long = | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,7 +23,7 @@ import scala.reflect.runtime.universe.typeTag | |
|
||
import org.apache.spark.annotation.InterfaceStability | ||
import org.apache.spark.sql.AnalysisException | ||
import org.apache.spark.sql.catalyst.expressions.Expression | ||
import org.apache.spark.sql.catalyst.expressions.{Expression, Literal} | ||
|
||
|
||
/** | ||
|
@@ -117,6 +117,7 @@ object DecimalType extends AbstractDataType { | |
val MAX_SCALE = 38 | ||
val SYSTEM_DEFAULT: DecimalType = DecimalType(MAX_PRECISION, 18) | ||
val USER_DEFAULT: DecimalType = DecimalType(10, 0) | ||
val MINIMUM_ADJUSTED_SCALE = 6 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Before naming a conf, I need to understand the rule you are following. https://docs.microsoft.com/en-us/sql/t-sql/data-types/precision-scale-and-length-transact-sql The SQL Server only applies There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I followed Hive's implementation which works like this and applies this 6 digits minimum to all operations. This means that SQLServer allows to round more digits than us in those cases, ie. we ensure at least 6 digits for the scale, while SQLServer doesn't. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @gatorsmile what about There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. how about There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should make it an internal conf and remove it after some releases. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, I'll go with that, thanks @cloud-fan. |
||
|
||
// The decimal types compatible with other numeric types | ||
private[sql] val ByteDecimal = DecimalType(3, 0) | ||
|
@@ -136,10 +137,52 @@ object DecimalType extends AbstractDataType { | |
case DoubleType => DoubleDecimal | ||
} | ||
|
||
private[sql] def forLiteral(literal: Literal): DecimalType = literal.value match { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this different than There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, please see my comment above for an example. Thanks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. since we have There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. but we also have |
||
case v: Short => fromBigDecimal(BigDecimal(v)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can't we just use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, please see my comments above. |
||
case v: Int => fromBigDecimal(BigDecimal(v)) | ||
case v: Long => fromBigDecimal(BigDecimal(v)) | ||
case _ => forType(literal.dataType) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This list is incomplete. Is that possible, the input literal is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this problem was present before this PR. Should we fix it here? Is this fix needed? I guess that if it would have been a problem, it would already have been reported. |
||
} | ||
|
||
private[sql] def fromBigDecimal(d: BigDecimal): DecimalType = { | ||
DecimalType(Math.max(d.precision, d.scale), d.scale) | ||
} | ||
|
||
private[sql] def bounded(precision: Int, scale: Int): DecimalType = { | ||
DecimalType(min(precision, MAX_PRECISION), min(scale, MAX_SCALE)) | ||
} | ||
|
||
/** | ||
* Scale adjustment implementation is based on Hive's one, which is itself inspired to | ||
* SQLServer's one. In particular, when a result precision is greater than | ||
* {@link #MAX_PRECISION}, the corresponding scale is reduced to prevent the integral part of a | ||
* result from being truncated. | ||
* | ||
* This method is used only when `spark.sql.decimalOperations.allowTruncat` is set to true. | ||
* | ||
* @param precision | ||
* @param scale | ||
* @return | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. remove the above 3 lines |
||
*/ | ||
private[sql] def adjustPrecisionScale(precision: Int, scale: Int): DecimalType = { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The logics in this adjustment function is also different from the MS SQL Server docs.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sorry, but I think this is exactly the same which is described there. The implementation might seem doing different things but actually the result will be the same. They both take the min between 6 and the desired scale if the precision is not enough to represent the whole scale. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So the rule in document is
I think this is a little different from the current rule
Think aboout the case There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @cloud-fan yes, but you have to keep in mind that we are doing so only when precision is > 38. With some simple math (given There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah i see, makes sense There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, this part is consistent. |
||
// Assumptions: | ||
// precision >= scale | ||
// scale >= 0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can add it even though it is not needed... there is no way we can violate those constraints. If you believe it is better to use assert, I will do that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use |
||
if (precision <= MAX_PRECISION) { | ||
// Adjustment only needed when we exceed max precision | ||
DecimalType(precision, scale) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't we also prevent There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is prevented outside this function. |
||
} else { | ||
// Precision/scale exceed maximum precision. Result must be adjusted to MAX_PRECISION. | ||
val intDigits = precision - scale | ||
// If original scale less than MINIMUM_ADJUSTED_SCALE, use original scale value; otherwise | ||
// preserve at least MINIMUM_ADJUSTED_SCALE fractional digits | ||
val minScaleValue = Math.min(scale, MINIMUM_ADJUSTED_SCALE) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Don't you get a scale lower than There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, sorry, my answer was very poor, I will rephrase. |
||
val adjustedScale = Math.max(MAX_PRECISION - intDigits, minScaleValue) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This line needs some comments. |
||
|
||
DecimalType(MAX_PRECISION, adjustedScale) | ||
} | ||
} | ||
|
||
override private[sql] def defaultConcreteType: DataType = SYSTEM_DEFAULT | ||
|
||
override private[sql] def acceptsType(other: DataType): Boolean = { | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
-- | ||
-- Licensed to the Apache Software Foundation (ASF) under one or more | ||
-- contributor license agreements. See the NOTICE file distributed with | ||
-- this work for additional information regarding copyright ownership. | ||
-- The ASF licenses this file to You under the Apache License, Version 2.0 | ||
-- (the "License"); you may not use this file except in compliance with | ||
-- the License. You may obtain a copy of the License at | ||
-- | ||
-- http://www.apache.org/licenses/LICENSE-2.0 | ||
-- | ||
-- Unless required by applicable law or agreed to in writing, software | ||
-- distributed under the License is distributed on an "AS IS" BASIS, | ||
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
-- See the License for the specific language governing permissions and | ||
-- limitations under the License. | ||
-- | ||
|
||
-- tests for decimals handling in operations | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why create a new test file instead of adding more cases in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. because that file was meant for the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. that file is under There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see. I'll merge this file into the other then. Thanks. |
||
create table decimals_test(id int, a decimal(38,18), b decimal(38,18)) using parquet; | ||
|
||
insert into decimals_test values(1, 100.0, 999.0), (2, 12345.123, 12345.123), | ||
(3, 0.1234567891011, 1234.1), (4, 123456789123456789.0, 1.123456789123456789); | ||
|
||
-- test decimal operations | ||
select id, a+b, a-b, a*b, a/b from decimals_test order by id; | ||
|
||
-- test operations between decimals and constants | ||
select id, a*10, b/10 from decimals_test order by id; | ||
|
||
-- use rounding instead of returning NULL, according to new Hive's behavior and SQL standard | ||
set spark.sql.decimalOperations.allowTruncat=true; | ||
|
||
-- test decimal operations | ||
select id, a+b, a-b, a*b, a/b from decimals_test order by id; | ||
|
||
-- test operations between decimals and constants | ||
select id, a*10, b/10 from decimals_test order by id; | ||
|
||
drop table decimals_test; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's better to push some reference about where we get this rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if scale needs to be less than 6, we would fail the analysis, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, we would return the a type with scale 6 and in
CheckOverflow
all the numbers which don't fit will be translated toNULL
, since this is current Spark behavior in such cases.May I kindly ask you to elaborate a but more what you mean by "push some reference"? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, typo, "put some reference"...
We can put a link to a document of a mainstream RDBMS and say this rule follows xxx...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did, but @gatorsmile told me to remove it: #20023 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did any open source RDBMS have this rule?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we already referred a commercial RDBMS in L33...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hive has it, but in the documentation it is not explained. And in the comments it just references the blog I referenced, but then removed according to @gatorsmile's comment.