-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check for overflow in sum aggregate function #2010
Conversation
@bikramSingh91 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bikramSingh91 Looks good % some comments on the test.
velox/exec/tests/AggregationTest.cpp
Outdated
@@ -484,6 +485,27 @@ class AggregationTest : public OperatorTestBase { | |||
DOUBLE(), | |||
VARCHAR()})}; | |||
folly::Random::DefaultGenerator rng_; | |||
|
|||
template <typename InputType, typename ResultType> | |||
void testSumOverflow(bool expectError, const ResultType expectedResult) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this test to velox/functions/prestosql/aggregates/tests/SumTest.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
velox/exec/tests/AggregationTest.cpp
Outdated
testSumOverflow<int16_t, int64_t>(false, 65533); | ||
testSumOverflow<int32_t, int64_t>(false, 4294967293); | ||
testSumOverflow<int64_t, int64_t>(true, 0); | ||
// TODO: add this back once sum agg for floats is fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the problem here? Can it be fixed in this PR or does it require a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created an issue and added the reference as a comment for more context on the issue
velox/exec/tests/AggregationTest.cpp
Outdated
|
||
template <typename InputType, typename ResultType> | ||
void testSumOverflow(bool expectError, const ResultType expectedResult) { | ||
auto expectedVector = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since expectedVector is used only if expectError is false, it might be more readable to move it to the 'else' branch below.
Also, I wonder if it would be cleaner to have this method test only overflow cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both cases which overflow and dont have very similar setup and code which is why I merged them. Changing the naming a bit to sound more generic. Please let me know if this helps readability or if you prefer splitting them.
velox/exec/tests/AggregationTest.cpp
Outdated
testSumOverflow<int8_t, int64_t>(false, 253); | ||
testSumOverflow<int16_t, int64_t>(false, 65533); | ||
testSumOverflow<int32_t, int64_t>(false, 4294967293); | ||
testSumOverflow<int64_t, int64_t>(true, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like overflow is tested only for 64-bit integers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
making the test a bit more generic to test limits
velox/exec/tests/AggregationTest.cpp
Outdated
// instead when floating points go over limit. | ||
testSumOverflow<int8_t, int64_t>(false, 253); | ||
testSumOverflow<int16_t, int64_t>(false, 65533); | ||
testSumOverflow<int32_t, int64_t>(false, 4294967293); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be cleaner to use std::numeric_limits instead of hard-coded constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
rebasing in next patch |
@bikramSingh91 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. @mbasmanova Do you have further suggestions?
data.push_back(makeRowVector({makeFlatVector<InputType>(input)})); | ||
|
||
// Testing these two steps provides enough coverage. Adding kfinal | ||
// involves more elaborate multi-fragment setup which would be an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not necessary to setup multi-fragment query to test final aggregation. Would you consider using AggregationTestBase::testAggregations instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing me to AggregationTestBase::testAggregations() this looks super useful to exhaustively test all aggregation node combinations.
For this test however it might be an overkill since we would only want to make sure that overflow gets caught for each step individually and chaining steps wont provide additional coverage.
To provide more context for my code comment above: I wanted to test the kfinal step with an input of 2 values that when added cause an overflow. (Please correct me if I am wrong) Since this is a global aggregation any previous aggregation (partial) step that I add to the same query fragment would result in hitting the overflow in that step itself. Therefore to make sure that i hit the overflow in the kFinal step the only way I could think of was to create 2 separate input fragments that both output a single value, which when aggregates at the kFinal step and hits an overflow.
Something akin to this looks promising but seems like its only run when grouping keys exist: https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/aggregates/tests/AggregationTestBase.cpp#L176-L196
For my test case specifically I can mimic the same steps above and try putting the 2 input values in 2 separate rowVectors and test whether they get plugged in through separate drivers feeding into a final agg. I'll report back if that works.
@bikramSingh91 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bikramSingh91 Looks great % one question.
std::is_same<TOutput, float>::value) { | ||
result += n * value; | ||
} else { | ||
result = functions::checkedPlus<TOutput>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this code path covered by the test? It looks like it might not be. To cover it use constant vector as input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbasmanova Thanks for catching that. I updated the test to check the duplicates path, both for overflow in the add and in the multiply. I however removed the verification on results for the cases where overflow does not occur to avoid bloating the test code as the newly added cases would have required separate calculation of expected results per test case. Please let me know if you think that works. thank you
This patch adds changes to check for overflow on every update operation in the sum aggregate function. This is only implemented for integer types as floating points get set to infinity once they overflow, which is a valid result. Test Plan: Verified that this causes no performance regression by running existing aggregation benchmark. Also added a unit test for the same.
@bikramSingh91 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@bikramSingh91 , @mbasmanova Looks like my PR #2163 is related to this and should probably land after this change. Is this going to land soon? Thanks. |
@majetideepak Yes this should land soon. Either today or tomorrow at the very latest. |
This patch adds changes to check for overflow on every update
operation in the sum aggregate function. This is only implemented
for integer types as floating points get set to infinity once they
overflow, which is a valid result.
Test Plan:
Verified that this causes no performance regression by running
existing aggregation benchmark.
Also added a unit test for the same.