Implement Spark decimal add and subtract #5791

jinchengchenghh · 2023-07-24T06:47:08Z

Use Arrow Gandiva BasicDecimal128 algorithm to compute value.
Arrow implementation:
https://github.com/apache/arrow/blob/release-12.0.1-rc1/cpp/src/gandiva/precompiled/decimal_ops.cc#L211-L231

Spark result precision and scale maybe different with Presto, because it will use adjustPrecisionScale to change the precision and scale when precision is beyond 38.
And this implement can compute data without overflow in some situation.

netlify · 2023-07-24T06:47:19Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`a15b51a`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/657c00e3aed2a8000850aeac

jinchengchenghh · 2023-11-01T02:55:22Z

Can you help review this PR? Thanks! @majetideepak

FelixYBW · 2023-11-06T20:47:44Z

@mbasmanova this PR added the decimal add/substract to align with Spark's implementation. Can you help to review? It's requested to pass TPCH/DS in Gluten.

mbasmanova · 2023-11-07T14:42:45Z

@rui-mo Rui, would you help review this PR?

mbasmanova · 2023-11-07T14:43:51Z

@jinchengchengh Jin, what is the difference between Spark and Presto for these functions?

FelixYBW · 2023-11-07T18:38:12Z

@rui-mo Rui, would you help review this PR?

Chengcheng is in leaves and may not reply in time. Rui covers all Chengcheng's PRs

jinchengchenghh · 2023-11-08T07:44:54Z

Spark result precision and scale maybe different with Presto, because it will use adjustPrecisionScale to change the precision and scale when precision is beyond 38.
And this implement can compute data without overflow in some situation.
For example, this test will fail in Presto(Velox) and success in Spark.

  TEST_F(DecimalArithmeticTest, tmp) {
  // Carry to left 0.
  testDecimalExpr<TypeKind::HUGEINT>(
      makeLongDecimalVector({"99999999999999999999999999999990000010"}, 38, 6),
      "plus(c0, c1)",
      {makeLongDecimalVector({"9999999999999999999999999999999000000"}, 38, 5),
       makeFlatVector(std::vector<int128_t>{100}, DECIMAL(38, 7))});
}

[ RUN      ] DecimalArithmeticTest.tmp
unknown file: Failure
C++ exception with description "Exception: VeloxUserError
Error Source: USER
Error Code: ARITHMETIC_ERROR
Reason: Decimal overflow: 9999999999999999999999999999999000000 + 100
Retriable: False
Context: plus(c0, c1)
Top-Level Context: Same as context.
Function: apply
File: ../velox/functions/prestosql/DecimalVectorFunctions.cpp
Line: 289
Stack trace:
Stack trace has been disabled. Use --velox_exception_user_stacktrace_enabled=true to enable it.
" thrown in the test body.
[  FAILED  ] DecimalArithmeticTest.tmp (2 ms)

Actually, presto cannot compute this while Spark can.

presto:default> select cast('99999999999999999999999999999990' as decimal(38, 5)) + cast('0.0000100' as decimal(38, 7));
Query 20231108_152021_00006_sqyi8 failed: Decimal overflow

spark-sql (default)> select cast('99999999999999999999999999999990' as decimal(38, 5)) + cast('0.0000100' as decimal(38, 7));
99999999999999999999999999999990.000010
Time taken: 0.127 seconds, Fetched 1 row(s)

@mbasmanova

mbasmanova · 2023-11-08T16:20:02Z

@jinchengchenghh Got it. Thank you for explaining. It would be nice to add these details to the PR description.

mbasmanova · 2023-11-08T16:22:15Z

velox/functions/sparksql/DecimalUtil.h

@@ -109,6 +109,21 @@ class DecimalUtil {
    return value;
  }

+  template <typename A, typename B>
+  inline static int32_t
+  minLeadingZeros(const A& a, const B& b, uint8_t aScale, uint8_t bScale) {


a and b are integers, right? Pass then by value, not const reference

mbasmanova

This code is hard to read. Can you write up some description of what it does and why it does it this way?

mbasmanova · 2023-11-08T16:22:49Z

velox/functions/sparksql/DecimalUtil.h

+  template <typename A, typename B>
+  inline static int32_t
+  minLeadingZeros(const A& a, const B& b, uint8_t aScale, uint8_t bScale) {
+    int32_t aLeadingZeros = bits::countLeadingZeros(absValue<A>(a));


nit: absValue(a) and absValue(b)

It generates below error if removing the template type.

couldn’t deduce template parameter ‘T’
118 | int32_t bLeadingZeros = bits::countLeadingZeros(absValue(b))

mbasmanova · 2023-11-08T16:23:35Z

velox/functions/sparksql/DecimalUtil.h

@@ -109,6 +109,21 @@ class DecimalUtil {
    return value;
  }

+  template <typename A, typename B>
+  inline static int32_t
+  minLeadingZeros(const A& a, const B& b, uint8_t aScale, uint8_t bScale) {


Would you document this function? Let's also add a test.

mbasmanova · 2023-11-08T16:24:13Z

velox/functions/sparksql/DecimalUtil.h

+  inline static int32_t minLeadingZerosAfterScaling(
+      int32_t numLeadingZeros,
+      int32_t scaleBy) {
+    int32_t result =


drop 'result' variable; just return ...

mbasmanova · 2023-11-08T16:24:39Z

velox/functions/sparksql/DecimalUtil.h

@@ -211,5 +226,16 @@ class DecimalUtil {
    int32_t numOccupied = sizeof(A) * 8 - bits::countLeadingZeros(valueAbs);
    return numOccupied + kMaxBitsRequiredIncreaseAfterScaling[aRescale];
  }
+
+  /// If we have a number with 'numLeadingZeros' leading zeros, and we scale it


I'm not sure I understand what this means. Other readers may be confused as well. Would you clarify?

Revised the comments.

rui-mo · 2023-11-15T03:02:24Z

This code is hard to read. Can you write up some description of what it does and why it does it this way?

This PR supports decimal add and subtract. The implementation derives from Arrow gandiva (link). Fast path of no overflow and general case are both considered. If the result precision is less than max precision of decimal, or both inputs contain at least 3 leading zeros after rescaling, overflow is not needed to be considered. Otherwise, two functions are added to handle inputs of the same sign and different signs separately, and overflow is considered.

jinchengchenghh · 2023-11-15T09:13:21Z

velox/functions/sparksql/DecimalArithmetic.cpp

-  VELOX_DCHECK_NE(a, 0);
-  VELOX_DCHECK_NE(b, 0);
-  VELOX_DCHECK((a < 0 && b > 0) || (a > 0 && b < 0));
+  VELOX_USER_CHECK(


This logic is guaranteed by this else branch, so it is VELOX_DCHECK_XX.

0578c4c#diff-4baee9f82f9c347f753972415d345941d2c2a110b608d480b090650171da096bL156

rui-mo · 2023-11-30T13:30:07Z

@mbasmanova Masha, I revised this PR. Could you spare some time to review again? Thanks.

mbasmanova · 2023-11-30T13:54:34Z

velox/docs/functions/spark/math.rst

+
+    ::
+
+        SELECT CAST(1.1232100 as DECIMAL(38, 7)) + CAST(1 as DECIMAL(10, 0)); -- decimal 2.123210


Nice examples. It would be helpful to specify the precision and scale of the results as well.

Thanks. Updated.

rui-mo · 2023-12-04T13:33:23Z

@mbasmanova Above comment was fixed. Could you help take further review? Thanks.

mbasmanova

@Yuhta Jimmy, would you help review this PR?

mbasmanova · 2023-11-30T13:56:02Z

velox/functions/sparksql/DecimalUtil.h

@@ -109,6 +109,33 @@ class DecimalUtil {
    return value;
  }

+  /// Returns the minumum number of leading zeros after scaling up two inputs
+  /// for certain scales. Inputs are decimal values of bigint or hugeint type.
+  template <typename TInput1, typename TInput2>


nit: TInput1,2 -> A and B

mbasmanova · 2023-11-30T13:56:28Z

velox/functions/sparksql/DecimalArithmetic.cpp

@@ -33,6 +33,175 @@ std::string getResultScale(std::string precision, std::string scale) {
      scale);
 }

+// Returns the whole and fraction parts of a decimal value.
+template <typename T>
+inline std::pair<T, T> getWholeAndFraction(const T value, uint8_t scale) {


drop 'const'

Fixed similar cases. Thanks.

mbasmanova · 2023-11-30T13:56:53Z

velox/functions/sparksql/DecimalArithmetic.cpp

+
+// Increases the scale of input value by 'delta'. Returns the input value if
+// delta is not positive.
+inline int128_t increaseScale(const int128_t in, int16_t delta) {


mbasmanova

@rui-mo Rui, do we have Fuzzer coverage for all these functions? If not, let's prioritize extending the Fuzzer. Otherwise, it will be very hard to ensure there are no bugs.

rui-mo · 2023-12-06T08:45:06Z

do we have Fuzzer coverage for all these functions?

@mbasmanova Actually we don't. I find at least below two limitations exist, so its fuzzer test will be skipped.

Signature with decimal is treated to be unsupported.

velox/velox/expression/tests/ExpressionFuzzer.cpp

Lines 197 to 206 in 2e71d8e

    
           bool isSupportedSignature( 
        
               const exec::FunctionSignature& signature, 
        
               bool enableComplexType) { 
        
             // Not supporting lambda functions, or functions using decimal and 
        
             // timestamp with time zone types. 
        
             return !( 
        
                 useTypeName(signature, "opaque") || 
        
                 useTypeName(signature, "long_decimal") || 
        
                 useTypeName(signature, "short_decimal") || 
        
                 useTypeName(signature, "decimal") ||

Signature with variable is also prevented.

velox/velox/expression/tests/ExpressionFuzzer.cpp

Lines 315 to 318 in 2e71d8e

    
           if (!(signature->variables().empty() || options_.enableComplexTypes)) { 
        
             LOG(WARNING) << "Skipping unsupported signature: " << function.first 
        
                          << signature->toString(); 
        
             continue;

On track with issue #1968.

velox/functions/sparksql/DecimalArithmetic.cpp

facebook-github-bot · 2023-12-15T22:55:57Z

@Yuhta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-12-18T22:40:55Z

@Yuhta merged this pull request in d3f0dc9.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 24, 2023

jinchengchenghh force-pushed the decimal_add branch 2 times, most recently from b0b07e7 to 90ed921 Compare August 3, 2023 07:15

jinchengchenghh force-pushed the decimal_add branch 2 times, most recently from e5a20aa to 447e73b Compare August 28, 2023 01:14

rui-mo force-pushed the decimal_add branch 2 times, most recently from d55591c to 816bb42 Compare October 31, 2023 05:31

mbasmanova reviewed Nov 8, 2023

View reviewed changes

rui-mo force-pushed the decimal_add branch from e8385e2 to b7bc4ed Compare November 15, 2023 02:25

rui-mo force-pushed the decimal_add branch from b7bc4ed to 0578c4c Compare November 15, 2023 03:09

jinchengchenghh commented Nov 15, 2023

View reviewed changes

rui-mo force-pushed the decimal_add branch from 0578c4c to 8ace0b6 Compare November 30, 2023 13:26

mbasmanova reviewed Nov 30, 2023

View reviewed changes

rui-mo force-pushed the decimal_add branch from 8ace0b6 to c403776 Compare December 1, 2023 02:49

mbasmanova requested a review from Yuhta December 4, 2023 13:35

mbasmanova reviewed Dec 5, 2023

View reviewed changes

rui-mo force-pushed the decimal_add branch from c403776 to b8f5cfc Compare December 6, 2023 08:30

Yuhta reviewed Dec 14, 2023

View reviewed changes

velox/functions/sparksql/DecimalArithmetic.cpp Outdated Show resolved Hide resolved

jinchengchenghh and others added 2 commits December 15, 2023 15:08

Implement Spark decimal add and subtract

1c7bffe

Fix comments

a15b51a

rui-mo force-pushed the decimal_add branch 2 times, most recently from b409bea to a15b51a Compare December 15, 2023 07:31

facebook-github-bot closed this in d3f0dc9 Dec 18, 2023

facebook-github-bot added the Merged label Dec 18, 2023

rui-mo mentioned this pull request Dec 20, 2023

Use consistent test API in DecimalArithmeticTest #8114

Closed

rui-mo mentioned this pull request Mar 12, 2024

Add make_timestamp Spark function #8812

Closed

rui-mo mentioned this pull request Mar 19, 2024

Extend expression fuzzer test to support decimal #9149

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Spark decimal add and subtract #5791

Implement Spark decimal add and subtract #5791

jinchengchenghh commented Jul 24, 2023 •

edited

Loading

netlify bot commented Jul 24, 2023 •

edited

Loading

jinchengchenghh commented Nov 1, 2023

FelixYBW commented Nov 6, 2023

mbasmanova commented Nov 7, 2023

mbasmanova commented Nov 7, 2023

FelixYBW commented Nov 7, 2023

jinchengchenghh commented Nov 8, 2023

mbasmanova commented Nov 8, 2023

mbasmanova Nov 8, 2023

mbasmanova left a comment

mbasmanova Nov 8, 2023

rui-mo Nov 15, 2023

mbasmanova Nov 8, 2023

rui-mo Nov 15, 2023

mbasmanova Nov 8, 2023

mbasmanova Nov 8, 2023

rui-mo Nov 15, 2023

rui-mo commented Nov 15, 2023

jinchengchenghh Nov 15, 2023

rui-mo Nov 30, 2023

rui-mo commented Nov 30, 2023

mbasmanova Nov 30, 2023

rui-mo Dec 1, 2023

rui-mo commented Dec 4, 2023

mbasmanova left a comment

mbasmanova Nov 30, 2023

rui-mo Dec 6, 2023

mbasmanova Nov 30, 2023

rui-mo Dec 6, 2023

mbasmanova Nov 30, 2023

mbasmanova left a comment

rui-mo commented Dec 6, 2023 •

edited

Loading

facebook-github-bot commented Dec 15, 2023

facebook-github-bot commented Dec 18, 2023


		::

		SELECT CAST(1.1232100 as DECIMAL(38, 7)) + CAST(1 as DECIMAL(10, 0)); -- decimal 2.123210

Implement Spark decimal add and subtract #5791

Implement Spark decimal add and subtract #5791

Conversation

jinchengchenghh commented Jul 24, 2023 • edited Loading

netlify bot commented Jul 24, 2023 • edited Loading

✅ Deploy Preview for meta-velox canceled.

jinchengchenghh commented Nov 1, 2023

FelixYBW commented Nov 6, 2023

mbasmanova commented Nov 7, 2023

mbasmanova commented Nov 7, 2023

FelixYBW commented Nov 7, 2023

jinchengchenghh commented Nov 8, 2023

mbasmanova commented Nov 8, 2023

Choose a reason for hiding this comment

mbasmanova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rui-mo commented Nov 15, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rui-mo commented Nov 30, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rui-mo commented Dec 4, 2023

mbasmanova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbasmanova left a comment

Choose a reason for hiding this comment

rui-mo commented Dec 6, 2023 • edited Loading

facebook-github-bot commented Dec 15, 2023

facebook-github-bot commented Dec 18, 2023

jinchengchenghh commented Jul 24, 2023 •

edited

Loading

netlify bot commented Jul 24, 2023 •

edited

Loading

rui-mo commented Dec 6, 2023 •

edited

Loading