Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Spark decimal add and subtract #5791

Closed

Conversation

jinchengchenghh
Copy link
Contributor

@jinchengchenghh jinchengchenghh commented Jul 24, 2023

Use Arrow Gandiva BasicDecimal128 algorithm to compute value.
Arrow implementation:
https://github.com/apache/arrow/blob/release-12.0.1-rc1/cpp/src/gandiva/precompiled/decimal_ops.cc#L211-L231

Spark result precision and scale maybe different with Presto, because it will use adjustPrecisionScale to change the precision and scale when precision is beyond 38.
And this implement can compute data without overflow in some situation.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 24, 2023
@netlify
Copy link

netlify bot commented Jul 24, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit a15b51a
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/657c00e3aed2a8000850aeac

@jinchengchenghh jinchengchenghh force-pushed the decimal_add branch 2 times, most recently from b0b07e7 to 90ed921 Compare August 3, 2023 07:15
@jinchengchenghh jinchengchenghh force-pushed the decimal_add branch 2 times, most recently from e5a20aa to 447e73b Compare August 28, 2023 01:14
@rui-mo rui-mo force-pushed the decimal_add branch 2 times, most recently from d55591c to 816bb42 Compare October 31, 2023 05:31
@jinchengchenghh
Copy link
Contributor Author

Can you help review this PR? Thanks! @majetideepak

@FelixYBW
Copy link

FelixYBW commented Nov 6, 2023

@mbasmanova this PR added the decimal add/substract to align with Spark's implementation. Can you help to review? It's requested to pass TPCH/DS in Gluten.

@mbasmanova
Copy link
Contributor

@rui-mo Rui, would you help review this PR?

@mbasmanova
Copy link
Contributor

@jinchengchengh Jin, what is the difference between Spark and Presto for these functions?

@FelixYBW
Copy link

FelixYBW commented Nov 7, 2023

@rui-mo Rui, would you help review this PR?

Chengcheng is in leaves and may not reply in time. Rui covers all Chengcheng's PRs

@jinchengchenghh
Copy link
Contributor Author

Spark result precision and scale maybe different with Presto, because it will use adjustPrecisionScale to change the precision and scale when precision is beyond 38.
And this implement can compute data without overflow in some situation.
For example, this test will fail in Presto(Velox) and success in Spark.

  TEST_F(DecimalArithmeticTest, tmp) {
  // Carry to left 0.
  testDecimalExpr<TypeKind::HUGEINT>(
      makeLongDecimalVector({"99999999999999999999999999999990000010"}, 38, 6),
      "plus(c0, c1)",
      {makeLongDecimalVector({"9999999999999999999999999999999000000"}, 38, 5),
       makeFlatVector(std::vector<int128_t>{100}, DECIMAL(38, 7))});
}

[ RUN      ] DecimalArithmeticTest.tmp
unknown file: Failure
C++ exception with description "Exception: VeloxUserError
Error Source: USER
Error Code: ARITHMETIC_ERROR
Reason: Decimal overflow: 9999999999999999999999999999999000000 + 100
Retriable: False
Context: plus(c0, c1)
Top-Level Context: Same as context.
Function: apply
File: ../velox/functions/prestosql/DecimalVectorFunctions.cpp
Line: 289
Stack trace:
Stack trace has been disabled. Use --velox_exception_user_stacktrace_enabled=true to enable it.
" thrown in the test body.
[  FAILED  ] DecimalArithmeticTest.tmp (2 ms)

Actually, presto cannot compute this while Spark can.

presto:default> select cast('99999999999999999999999999999990' as decimal(38, 5)) + cast('0.0000100' as decimal(38, 7));
Query 20231108_152021_00006_sqyi8 failed: Decimal overflow

spark-sql (default)> select cast('99999999999999999999999999999990' as decimal(38, 5)) + cast('0.0000100' as decimal(38, 7));
99999999999999999999999999999990.000010
Time taken: 0.127 seconds, Fetched 1 row(s)

@mbasmanova

@mbasmanova
Copy link
Contributor

@jinchengchenghh Got it. Thank you for explaining. It would be nice to add these details to the PR description.

@@ -109,6 +109,21 @@ class DecimalUtil {
return value;
}

template <typename A, typename B>
inline static int32_t
minLeadingZeros(const A& a, const B& b, uint8_t aScale, uint8_t bScale) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a and b are integers, right? Pass then by value, not const reference

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is hard to read. Can you write up some description of what it does and why it does it this way?

template <typename A, typename B>
inline static int32_t
minLeadingZeros(const A& a, const B& b, uint8_t aScale, uint8_t bScale) {
int32_t aLeadingZeros = bits::countLeadingZeros(absValue<A>(a));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: absValue(a) and absValue(b)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It generates below error if removing the template type.

couldn’t deduce template parameter ‘T’
118 | int32_t bLeadingZeros = bits::countLeadingZeros(absValue(b))

@@ -109,6 +109,21 @@ class DecimalUtil {
return value;
}

template <typename A, typename B>
inline static int32_t
minLeadingZeros(const A& a, const B& b, uint8_t aScale, uint8_t bScale) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you document this function? Let's also add a test.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

inline static int32_t minLeadingZerosAfterScaling(
int32_t numLeadingZeros,
int32_t scaleBy) {
int32_t result =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop 'result' variable; just return ...

@@ -211,5 +226,16 @@ class DecimalUtil {
int32_t numOccupied = sizeof(A) * 8 - bits::countLeadingZeros(valueAbs);
return numOccupied + kMaxBitsRequiredIncreaseAfterScaling[aRescale];
}

/// If we have a number with 'numLeadingZeros' leading zeros, and we scale it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what this means. Other readers may be confused as well. Would you clarify?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised the comments.

@rui-mo
Copy link
Collaborator

rui-mo commented Nov 15, 2023

This code is hard to read. Can you write up some description of what it does and why it does it this way?

This PR supports decimal add and subtract. The implementation derives from Arrow gandiva (link). Fast path of no overflow and general case are both considered. If the result precision is less than max precision of decimal, or both inputs contain at least 3 leading zeros after rescaling, overflow is not needed to be considered. Otherwise, two functions are added to handle inputs of the same sign and different signs separately, and overflow is considered.

VELOX_DCHECK_NE(a, 0);
VELOX_DCHECK_NE(b, 0);
VELOX_DCHECK((a < 0 && b > 0) || (a > 0 && b < 0));
VELOX_USER_CHECK(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is guaranteed by this else branch, so it is VELOX_DCHECK_XX.

0578c4c#diff-4baee9f82f9c347f753972415d345941d2c2a110b608d480b090650171da096bL156

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@rui-mo
Copy link
Collaborator

rui-mo commented Nov 30, 2023

@mbasmanova Masha, I revised this PR. Could you spare some time to review again? Thanks.


::

SELECT CAST(1.1232100 as DECIMAL(38, 7)) + CAST(1 as DECIMAL(10, 0)); -- decimal 2.123210
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice examples. It would be helpful to specify the precision and scale of the results as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Updated.

@rui-mo
Copy link
Collaborator

rui-mo commented Dec 4, 2023

@mbasmanova Above comment was fixed. Could you help take further review? Thanks.

@mbasmanova mbasmanova requested a review from Yuhta December 4, 2023 13:35
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yuhta Jimmy, would you help review this PR?

@@ -109,6 +109,33 @@ class DecimalUtil {
return value;
}

/// Returns the minumum number of leading zeros after scaling up two inputs
/// for certain scales. Inputs are decimal values of bigint or hugeint type.
template <typename TInput1, typename TInput2>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: TInput1,2 -> A and B

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@@ -33,6 +33,175 @@ std::string getResultScale(std::string precision, std::string scale) {
scale);
}

// Returns the whole and fraction parts of a decimal value.
template <typename T>
inline std::pair<T, T> getWholeAndFraction(const T value, uint8_t scale) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop 'const'

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed similar cases. Thanks.


// Increases the scale of input value by 'delta'. Returns the input value if
// delta is not positive.
inline int128_t increaseScale(const int128_t in, int16_t delta) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rui-mo Rui, do we have Fuzzer coverage for all these functions? If not, let's prioritize extending the Fuzzer. Otherwise, it will be very hard to ensure there are no bugs.

@rui-mo
Copy link
Collaborator

rui-mo commented Dec 6, 2023

do we have Fuzzer coverage for all these functions?

@mbasmanova Actually we don't. I find at least below two limitations exist, so its fuzzer test will be skipped.

  • Signature with decimal is treated to be unsupported.

bool isSupportedSignature(
const exec::FunctionSignature& signature,
bool enableComplexType) {
// Not supporting lambda functions, or functions using decimal and
// timestamp with time zone types.
return !(
useTypeName(signature, "opaque") ||
useTypeName(signature, "long_decimal") ||
useTypeName(signature, "short_decimal") ||
useTypeName(signature, "decimal") ||

  • Signature with variable is also prevented.

if (!(signature->variables().empty() || options_.enableComplexTypes)) {
LOG(WARNING) << "Skipping unsupported signature: " << function.first
<< signature->toString();
continue;

On track with issue #1968.

@rui-mo rui-mo force-pushed the decimal_add branch 2 times, most recently from b409bea to a15b51a Compare December 15, 2023 07:31
@facebook-github-bot
Copy link
Contributor

@Yuhta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@Yuhta merged this pull request in d3f0dc9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants