-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Spark decimal add and subtract #5791
Implement Spark decimal add and subtract #5791
Conversation
✅ Deploy Preview for meta-velox canceled.
|
b0b07e7
to
90ed921
Compare
e5a20aa
to
447e73b
Compare
d55591c
to
816bb42
Compare
Can you help review this PR? Thanks! @majetideepak |
@mbasmanova this PR added the decimal add/substract to align with Spark's implementation. Can you help to review? It's requested to pass TPCH/DS in Gluten. |
@rui-mo Rui, would you help review this PR? |
@jinchengchengh Jin, what is the difference between Spark and Presto for these functions? |
Chengcheng is in leaves and may not reply in time. Rui covers all Chengcheng's PRs |
Spark result precision and scale maybe different with Presto, because it will use
Actually, presto cannot compute this while Spark can.
|
@jinchengchenghh Got it. Thank you for explaining. It would be nice to add these details to the PR description. |
@@ -109,6 +109,21 @@ class DecimalUtil { | |||
return value; | |||
} | |||
|
|||
template <typename A, typename B> | |||
inline static int32_t | |||
minLeadingZeros(const A& a, const B& b, uint8_t aScale, uint8_t bScale) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a and b are integers, right? Pass then by value, not const reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is hard to read. Can you write up some description of what it does and why it does it this way?
template <typename A, typename B> | ||
inline static int32_t | ||
minLeadingZeros(const A& a, const B& b, uint8_t aScale, uint8_t bScale) { | ||
int32_t aLeadingZeros = bits::countLeadingZeros(absValue<A>(a)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: absValue(a) and absValue(b)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It generates below error if removing the template type.
couldn’t deduce template parameter ‘T’
118 | int32_t bLeadingZeros = bits::countLeadingZeros(absValue(b))
@@ -109,6 +109,21 @@ class DecimalUtil { | |||
return value; | |||
} | |||
|
|||
template <typename A, typename B> | |||
inline static int32_t | |||
minLeadingZeros(const A& a, const B& b, uint8_t aScale, uint8_t bScale) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you document this function? Let's also add a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
inline static int32_t minLeadingZerosAfterScaling( | ||
int32_t numLeadingZeros, | ||
int32_t scaleBy) { | ||
int32_t result = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drop 'result' variable; just return ...
@@ -211,5 +226,16 @@ class DecimalUtil { | |||
int32_t numOccupied = sizeof(A) * 8 - bits::countLeadingZeros(valueAbs); | |||
return numOccupied + kMaxBitsRequiredIncreaseAfterScaling[aRescale]; | |||
} | |||
|
|||
/// If we have a number with 'numLeadingZeros' leading zeros, and we scale it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand what this means. Other readers may be confused as well. Would you clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised the comments.
This PR supports decimal add and subtract. The implementation derives from Arrow gandiva (link). Fast path of no overflow and general case are both considered. If the result precision is less than max precision of decimal, or both inputs contain at least 3 leading zeros after rescaling, overflow is not needed to be considered. Otherwise, two functions are added to handle inputs of the same sign and different signs separately, and overflow is considered. |
VELOX_DCHECK_NE(a, 0); | ||
VELOX_DCHECK_NE(b, 0); | ||
VELOX_DCHECK((a < 0 && b > 0) || (a > 0 && b < 0)); | ||
VELOX_USER_CHECK( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is guaranteed by this else
branch, so it is VELOX_DCHECK_XX
.
0578c4c#diff-4baee9f82f9c347f753972415d345941d2c2a110b608d480b090650171da096bL156
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@mbasmanova Masha, I revised this PR. Could you spare some time to review again? Thanks. |
velox/docs/functions/spark/math.rst
Outdated
|
||
:: | ||
|
||
SELECT CAST(1.1232100 as DECIMAL(38, 7)) + CAST(1 as DECIMAL(10, 0)); -- decimal 2.123210 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice examples. It would be helpful to specify the precision and scale of the results as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Updated.
@mbasmanova Above comment was fixed. Could you help take further review? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yuhta Jimmy, would you help review this PR?
@@ -109,6 +109,33 @@ class DecimalUtil { | |||
return value; | |||
} | |||
|
|||
/// Returns the minumum number of leading zeros after scaling up two inputs | |||
/// for certain scales. Inputs are decimal values of bigint or hugeint type. | |||
template <typename TInput1, typename TInput2> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: TInput1,2 -> A and B
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
@@ -33,6 +33,175 @@ std::string getResultScale(std::string precision, std::string scale) { | |||
scale); | |||
} | |||
|
|||
// Returns the whole and fraction parts of a decimal value. | |||
template <typename T> | |||
inline std::pair<T, T> getWholeAndFraction(const T value, uint8_t scale) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drop 'const'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed similar cases. Thanks.
|
||
// Increases the scale of input value by 'delta'. Returns the input value if | ||
// delta is not positive. | ||
inline int128_t increaseScale(const int128_t in, int16_t delta) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rui-mo Rui, do we have Fuzzer coverage for all these functions? If not, let's prioritize extending the Fuzzer. Otherwise, it will be very hard to ensure there are no bugs.
@mbasmanova Actually we don't. I find at least below two limitations exist, so its fuzzer test will be skipped.
velox/velox/expression/tests/ExpressionFuzzer.cpp Lines 197 to 206 in 2e71d8e
velox/velox/expression/tests/ExpressionFuzzer.cpp Lines 315 to 318 in 2e71d8e
On track with issue #1968. |
b409bea
to
a15b51a
Compare
@Yuhta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Use Arrow Gandiva BasicDecimal128 algorithm to compute value.
Arrow implementation:
https://github.com/apache/arrow/blob/release-12.0.1-rc1/cpp/src/gandiva/precompiled/decimal_ops.cc#L211-L231
Spark result precision and scale maybe different with Presto, because it will use adjustPrecisionScale to change the precision and scale when precision is beyond 38.
And this implement can compute data without overflow in some situation.