Implementing array_sum function #5

jwyles-ahana · 2022-05-10T21:28:20Z

No description provided.

…tor#1500) Summary: Enhance printExprWithStats to identify common-sub expressions. For example, `c0 + c1` is a common sub-expression in `"(c0 + c1) % 5", " (c0 + c1) % 3"` expression set. It is evaluated only once and there is a single Expr object that represents it. That object appears in the expression tree twice. printExprWithStats does not show the runtime stats for second instance of that expression and instead annotates it with `[CSE https://github.com/facebookincubator/velox/issues/2]`, where CSE stands for common sub-expression and 2 refers to the first instance of the expression. ``` mod [cpu time: 50.49us, rows: 1024] -> BIGINT [#1] cast(plus as BIGINT) [cpu time: 68.15us, rows: 1024] -> BIGINT [#2] plus [cpu time: 51.84us, rows: 1024] -> INTEGER [#3] c0 [cpu time: 0ns, rows: 0] -> INTEGER [#4] c1 [cpu time: 0ns, rows: 0] -> INTEGER [#5] 5:BIGINT [cpu time: 0ns, rows: 0] -> BIGINT [#6] mod [cpu time: 49.29us, rows: 1024] -> BIGINT [#7] cast((plus(c0, c1)) as BIGINT) -> BIGINT [CSE #2] 3:BIGINT [cpu time: 0ns, rows: 0] -> BIGINT [#8] ``` Pull Request resolved: facebookincubator#1500 Reviewed By: Yuhta Differential Revision: D35994836 Pulled By: mbasmanova fbshipit-source-id: 6bacbbe61b68dad97ce2fd5f99610c4ad55897be

yingsu00

Some initial comments

yingsu00 · 2022-05-19T05:41:53Z

velox/functions/prestosql/ArraySum.cpp

+  // Allocate new vector for the result
+  memory::MemoryPool* pool = context->pool();
+  auto resultVector = BaseVector::create(outputType, numRows, pool);
+


Remove extra empty line

Actually you need to run the reformat code(Code-> Reformat Code). There're other lines that would fail the format check too

yingsu00 · 2022-05-19T05:49:03Z

velox/functions/prestosql/ArraySum.cpp

+  // Get access to raw values for the result
+  OT* resultValues = (OT*) resultVector->valuesAsVoid();
+
+  // Iterate over the input vector and find the sum of each array's values


This comment is not needed because it's very obvious. The comments need to be succinct.

yingsu00 · 2022-05-19T05:51:06Z

velox/functions/prestosql/ArraySum.cpp

+
+  // Iterate over the input vector and find the sum of each array's values
+  for (int i = 0; i < numRows; i++) {
+    // If the whole array is null then set the row null in the output


This comment is not needed

yingsu00 · 2022-05-19T05:51:17Z

velox/functions/prestosql/ArraySum.cpp

+    if (arrayVector->isNullAt(i)) {
+      resultVector->setNull(i, true);
+    }
+    // If the array is not null then sum the elements and set the result to the sum


This comment is not needed

yingsu00 · 2022-05-19T05:51:23Z

velox/functions/prestosql/ArraySum.cpp

+        }
+      }
+
+      // Set the value at i equal to the sum


This comment is not needed

yingsu00 · 2022-05-19T08:48:11Z

velox/functions/prestosql/ArraySum.cpp

+  for (int i = 0; i < numRows; i++) {
+    // If the whole array is null then set the row null in the output
+    if (arrayVector->isNullAt(i)) {
+      resultVector->setNull(i, true);


The Presto function description says "Returns the sum of all non-null elements of the array. If there is no non-null elements, returns 0. "

The Presto function description does not specify what to do in the case that the array itself is null (the case handled here) so I went with null in null out.

yingsu00 · 2022-05-19T08:56:59Z

velox/functions/prestosql/ArraySum.cpp

+ if (kind == TypeKind::REAL || kind == TypeKind::DOUBLE) {
+   return std::make_shared<ArraySumFunction<IT, double>>();
+ }
+ VELOX_FAIL()


Add the message showing what kind of error it is.

yingsu00 · 2022-05-19T08:58:25Z

velox/functions/prestosql/ArraySum.cpp

+}
+
+// Define function signature.
+// array(T1) -> T2 where T must be coercible to bigint or double, and


T should be T1?

Yes, it should. I will fix.

yingsu00 · 2022-05-19T09:02:37Z

velox/functions/prestosql/tests/ArraySumTest.cpp

+} // namespace
+
+// Test integer arrays.
+TEST_F(ArraySumTest, integer64Input) {


Can you add tests on the some of the types that are not coercible to double, and expect the query fails and output expected message?

I have added tests for StringView and bool types which should (and do) fail with an exception.

aditi-pandit · 2022-05-19T14:51:59Z

velox/functions/prestosql/ArraySum.cpp

+// array(T1) -> T2 where T must be coercible to bigint or double, and
+// T2 is bigint or double
+std::vector<std::shared_ptr<exec::FunctionSignature>> signatures() {
+ return {


Just curious does array_sum works with decimal types ? If yes, then that can be the cause of some complexity for these signatures and implementation. We needn't work on it on that PR but please inform Karteek, etc about it.

aditi-pandit · 2022-05-19T14:55:06Z

velox/functions/prestosql/ArraySum.cpp

+     arrayType->kind(),
+     TypeKind::ARRAY,
+     "array_sum requires argument of type ARRAY");
+}


Please add a validation here that the child of the array type should be coercible to double.

I have added a validation.

aditi-pandit · 2022-05-19T14:56:11Z

velox/functions/prestosql/ArraySum.cpp

+}
+
+template <>
+void ArraySumFunction<Timestamp, int64_t>::apply(


Just curious, why are these needed ?

The main apply function will not compile for Timestamp, StringView, and Date. Since these specializations are never used (due to the acceptable signatures of array_sum) I have added the specializations that don't compile as no-op functions which lets things compile.

Can't we follow the applyTyped + VELOX_DYNAMIC_SCALAR_TEMPLATE_TYPE_DISPATCH approach as in ArrayMinMax.cpp? That would clean up these?

aditi-pandit · 2022-05-19T14:59:06Z

velox/functions/prestosql/tests/ArraySumTest.cpp

+
+// Test floating point arrays
+TEST_F(ArraySumTest, floatInput) {
+  auto input = makeNullableArrayVector<float>({{0, 1, 2},


Add tests for values std::numeric_limits::min(), max(), inifinity() and quiet_NaN().

I have added some tests for these.

majetideepak · 2022-05-26T00:53:33Z

velox/functions/prestosql/ArraySum.cpp

+// Define function signature.
+// array(T1) -> T2 where T1 must be coercible to bigint or double, and
+// T2 is bigint or double
+std::vector<std::shared_ptr<exec::FunctionSignature>> signatures() {


The signatures() approach from ArrayMinMax.cpp can be followed here as well, using an unordered_map instead of the vector.

majetideepak · 2022-05-27T19:45:21Z

velox/functions/prestosql/ArraySum.cpp

+  };
+  std::vector<std::shared_ptr<exec::FunctionSignature>> signatures;
+  signatures.reserve(s.size());
+  for (const auto& typeName : s) {


nit: for (const auto& [returnType, argType] : s) is preferred.

majetideepak

@jwyles-ahana great to see the dummy definitions go away. Made some more comments.

majetideepak · 2022-05-31T11:35:55Z

velox/functions/prestosql/ArraySum.cpp

-      createTyped, elementType->kind(), inputArgs);
+  switch (elementType->kind()) {
+    case TypeKind::TINYINT: {
+      return std::make_shared<ArraySumFunction<int8_t, int64_t>>();


use TypeTraits<TypeKind::TINYINT>::NativeType instead of int8_t. Same below.

Made change.

majetideepak · 2022-05-31T12:44:16Z

velox/functions/prestosql/ArraySum.cpp

+    // Allocate new vector for the result
+    memory::MemoryPool* pool = context->pool();
+    auto resultVector = BaseVector::create(outputType, numRows, pool);
+    OT* resultValues = (OT*)resultVector->valuesAsVoid();


We can directly write to result and avoid a copy. Also, I feel the compiler cannot vectorize the loop, so we can use the API to set the values instead of dealing with raw values.

BaseVector::ensureWritable(rows, outputType, context->pool(), result); auto resultValues = (*result)->asFlatVector<OT>(); ... resultValues->setNull(i, true); .... resultValues->set(i, sum);

We have to use rows.applyToSelected to ensure only selected rows are set.

aditi-pandit · 2022-05-31T17:07:53Z

velox/functions/prestosql/ArraySum.cpp

+ */
+
+#include "velox/expression/EvalCtx.h"
+#include "velox/expression/Expr.h"


You might be able to remove the first 2 includes. Just including "VectorFunction.h" is sufficient I think.

aditi-pandit · 2022-05-31T17:08:39Z

velox/functions/prestosql/ArraySum.cpp

+template <typename IT, typename OT>
+class ArraySumFunction : public exec::VectorFunction {
+ public:
+  // Execute function.


This comment can be removed.

aditi-pandit · 2022-05-31T17:09:24Z

velox/functions/prestosql/ArraySum.cpp

+namespace facebook::velox::functions {
+namespace {
+
+// See documentation at https://prestodb.io/docs/current/functions/array.html


Move this line below "Implements the array_sum function"

aditi-pandit · 2022-05-31T17:12:38Z

velox/functions/prestosql/ArraySum.cpp

+      valueTypeKind == TypeKind::SMALLINT ||
+      valueTypeKind == TypeKind::INTEGER || valueTypeKind == TypeKind::BIGINT ||
+      valueTypeKind == TypeKind::REAL || valueTypeKind == TypeKind::DOUBLE;
+  VELOX_USER_CHECK_EQ(isCoercibleToDouble, true, "Invalid value type");


Please add information of the invalid valueTypeKind in the error message.

jwyles-ahana requested review from aditi-pandit, majetideepak and yingsu00 May 13, 2022 16:27

jwyles-ahana changed the title ~~[WIP] First try at implementing array_sum function~~ Implementing array_sum function May 13, 2022

yingsu00 reviewed May 19, 2022

View reviewed changes

aditi-pandit reviewed May 19, 2022

View reviewed changes

majetideepak reviewed May 26, 2022

View reviewed changes

majetideepak reviewed May 27, 2022

View reviewed changes

majetideepak reviewed May 31, 2022

View reviewed changes

aditi-pandit reviewed May 31, 2022

View reviewed changes

jwyles-ahana force-pushed the array_sum branch 7 times, most recently from 08a1e49 to aa59d73 Compare July 27, 2022 20:54

jwyles-ahana force-pushed the array_sum branch 11 times, most recently from c3dd658 to bec6db1 Compare August 2, 2022 18:22

Add array_sum Presto function

cde6b49

jwyles-ahana force-pushed the array_sum branch from bec6db1 to cde6b49 Compare August 17, 2022 17:48

pramodsatya force-pushed the main branch from a29de49 to 5d4f8b1 Compare January 26, 2023 23:10

Implementing array_sum function #5

Are you sure you want to change the base?

Implementing array_sum function #5

Conversation

jwyles-ahana commented May 10, 2022

yingsu00 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

majetideepak May 27, 2022 • edited Loading

Choose a reason for hiding this comment

majetideepak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

majetideepak May 27, 2022 •

edited

Loading