-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing array_sum function #5
base: main
Are you sure you want to change the base?
Conversation
…tor#1500) Summary: Enhance printExprWithStats to identify common-sub expressions. For example, `c0 + c1` is a common sub-expression in `"(c0 + c1) % 5", " (c0 + c1) % 3"` expression set. It is evaluated only once and there is a single Expr object that represents it. That object appears in the expression tree twice. printExprWithStats does not show the runtime stats for second instance of that expression and instead annotates it with `[CSE https://github.com/facebookincubator/velox/issues/2]`, where CSE stands for common sub-expression and 2 refers to the first instance of the expression. ``` mod [cpu time: 50.49us, rows: 1024] -> BIGINT [#1] cast(plus as BIGINT) [cpu time: 68.15us, rows: 1024] -> BIGINT [#2] plus [cpu time: 51.84us, rows: 1024] -> INTEGER [#3] c0 [cpu time: 0ns, rows: 0] -> INTEGER [#4] c1 [cpu time: 0ns, rows: 0] -> INTEGER [#5] 5:BIGINT [cpu time: 0ns, rows: 0] -> BIGINT [#6] mod [cpu time: 49.29us, rows: 1024] -> BIGINT [#7] cast((plus(c0, c1)) as BIGINT) -> BIGINT [CSE #2] 3:BIGINT [cpu time: 0ns, rows: 0] -> BIGINT [#8] ``` Pull Request resolved: facebookincubator#1500 Reviewed By: Yuhta Differential Revision: D35994836 Pulled By: mbasmanova fbshipit-source-id: 6bacbbe61b68dad97ce2fd5f99610c4ad55897be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments
// Allocate new vector for the result | ||
memory::MemoryPool* pool = context->pool(); | ||
auto resultVector = BaseVector::create(outputType, numRows, pool); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove extra empty line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually you need to run the reformat code(Code-> Reformat Code). There're other lines that would fail the format check too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
// Get access to raw values for the result | ||
OT* resultValues = (OT*) resultVector->valuesAsVoid(); | ||
|
||
// Iterate over the input vector and find the sum of each array's values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is not needed because it's very obvious. The comments need to be succinct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
|
||
// Iterate over the input vector and find the sum of each array's values | ||
for (int i = 0; i < numRows; i++) { | ||
// If the whole array is null then set the row null in the output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
if (arrayVector->isNullAt(i)) { | ||
resultVector->setNull(i, true); | ||
} | ||
// If the array is not null then sum the elements and set the result to the sum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
} | ||
} | ||
|
||
// Set the value at i equal to the sum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
for (int i = 0; i < numRows; i++) { | ||
// If the whole array is null then set the row null in the output | ||
if (arrayVector->isNullAt(i)) { | ||
resultVector->setNull(i, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Presto function description says "Returns the sum of all non-null elements of the array. If there is no non-null elements, returns 0. "
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Presto function description does not specify what to do in the case that the array itself is null (the case handled here) so I went with null in null out.
if (kind == TypeKind::REAL || kind == TypeKind::DOUBLE) { | ||
return std::make_shared<ArraySumFunction<IT, double>>(); | ||
} | ||
VELOX_FAIL() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the message showing what kind of error it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
// Define function signature. | ||
// array(T1) -> T2 where T must be coercible to bigint or double, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
T should be T1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should. I will fix.
} // namespace | ||
|
||
// Test integer arrays. | ||
TEST_F(ArraySumTest, integer64Input) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add tests on the some of the types that are not coercible to double, and expect the query fails and output expected message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added tests for StringView and bool types which should (and do) fail with an exception.
// array(T1) -> T2 where T must be coercible to bigint or double, and | ||
// T2 is bigint or double | ||
std::vector<std::shared_ptr<exec::FunctionSignature>> signatures() { | ||
return { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious does array_sum works with decimal types ? If yes, then that can be the cause of some complexity for these signatures and implementation. We needn't work on it on that PR but please inform Karteek, etc about it.
arrayType->kind(), | ||
TypeKind::ARRAY, | ||
"array_sum requires argument of type ARRAY"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a validation here that the child of the array type should be coercible to double.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a validation.
} | ||
|
||
template <> | ||
void ArraySumFunction<Timestamp, int64_t>::apply( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, why are these needed ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main apply function will not compile for Timestamp, StringView, and Date. Since these specializations are never used (due to the acceptable signatures of array_sum) I have added the specializations that don't compile as no-op functions which lets things compile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we follow the applyTyped
+ VELOX_DYNAMIC_SCALAR_TEMPLATE_TYPE_DISPATCH
approach as in ArrayMinMax.cpp
? That would clean up these?
|
||
// Test floating point arrays | ||
TEST_F(ArraySumTest, floatInput) { | ||
auto input = makeNullableArrayVector<float>({{0, 1, 2}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add tests for values std::numeric_limits::min(), max(), inifinity() and quiet_NaN().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added some tests for these.
// Define function signature. | ||
// array(T1) -> T2 where T1 must be coercible to bigint or double, and | ||
// T2 is bigint or double | ||
std::vector<std::shared_ptr<exec::FunctionSignature>> signatures() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The signatures()
approach from ArrayMinMax.cpp
can be followed here as well, using an unordered_map
instead of the vector
.
}; | ||
std::vector<std::shared_ptr<exec::FunctionSignature>> signatures; | ||
signatures.reserve(s.size()); | ||
for (const auto& typeName : s) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for (const auto& [returnType, argType] : s)
is preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jwyles-ahana great to see the dummy definitions go away. Made some more comments.
createTyped, elementType->kind(), inputArgs); | ||
switch (elementType->kind()) { | ||
case TypeKind::TINYINT: { | ||
return std::make_shared<ArraySumFunction<int8_t, int64_t>>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use TypeTraits<TypeKind::TINYINT>::NativeType
instead of int8_t
. Same below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made change.
// Allocate new vector for the result | ||
memory::MemoryPool* pool = context->pool(); | ||
auto resultVector = BaseVector::create(outputType, numRows, pool); | ||
OT* resultValues = (OT*)resultVector->valuesAsVoid(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can directly write to result
and avoid a copy. Also, I feel the compiler cannot vectorize the loop, so we can use the API to set the values instead of dealing with raw values.
BaseVector::ensureWritable(rows, outputType, context->pool(), result);
auto resultValues = (*result)->asFlatVector<OT>();
...
resultValues->setNull(i, true);
....
resultValues->set(i, sum);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to use rows.applyToSelected
to ensure only selected rows are set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
*/ | ||
|
||
#include "velox/expression/EvalCtx.h" | ||
#include "velox/expression/Expr.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might be able to remove the first 2 includes. Just including "VectorFunction.h" is sufficient I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
template <typename IT, typename OT> | ||
class ArraySumFunction : public exec::VectorFunction { | ||
public: | ||
// Execute function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
namespace facebook::velox::functions { | ||
namespace { | ||
|
||
// See documentation at https://prestodb.io/docs/current/functions/array.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this line below "Implements the array_sum function"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
valueTypeKind == TypeKind::SMALLINT || | ||
valueTypeKind == TypeKind::INTEGER || valueTypeKind == TypeKind::BIGINT || | ||
valueTypeKind == TypeKind::REAL || valueTypeKind == TypeKind::DOUBLE; | ||
VELOX_USER_CHECK_EQ(isCoercibleToDouble, true, "Invalid value type"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add information of the invalid valueTypeKind in the error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
08a1e49
to
aa59d73
Compare
c3dd658
to
bec6db1
Compare
bec6db1
to
cde6b49
Compare
No description provided.