Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for overflow in sum aggregate function #2010

Closed
wants to merge 1 commit into from

Conversation

bikramSingh91
Copy link
Contributor

This patch adds changes to check for overflow on every update
operation in the sum aggregate function. This is only implemented
for integer types as floating points get set to infinity once they
overflow, which is a valid result.

Test Plan:
Verified that this causes no performance regression by running
existing aggregation benchmark.
Also added a unit test for the same.

@bikramSingh91 bikramSingh91 requested a review from mbasmanova July 14, 2022 00:30
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 14, 2022
@facebook-github-bot
Copy link
Contributor

@bikramSingh91 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bikramSingh91 Looks good % some comments on the test.

@@ -484,6 +485,27 @@ class AggregationTest : public OperatorTestBase {
DOUBLE(),
VARCHAR()})};
folly::Random::DefaultGenerator rng_;

template <typename InputType, typename ResultType>
void testSumOverflow(bool expectError, const ResultType expectedResult) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this test to velox/functions/prestosql/aggregates/tests/SumTest.cpp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

testSumOverflow<int16_t, int64_t>(false, 65533);
testSumOverflow<int32_t, int64_t>(false, 4294967293);
testSumOverflow<int64_t, int64_t>(true, 0);
// TODO: add this back once sum agg for floats is fixed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the problem here? Can it be fixed in this PR or does it require a separate PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created an issue and added the reference as a comment for more context on the issue


template <typename InputType, typename ResultType>
void testSumOverflow(bool expectError, const ResultType expectedResult) {
auto expectedVector =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since expectedVector is used only if expectError is false, it might be more readable to move it to the 'else' branch below.

Also, I wonder if it would be cleaner to have this method test only overflow cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both cases which overflow and dont have very similar setup and code which is why I merged them. Changing the naming a bit to sound more generic. Please let me know if this helps readability or if you prefer splitting them.

testSumOverflow<int8_t, int64_t>(false, 253);
testSumOverflow<int16_t, int64_t>(false, 65533);
testSumOverflow<int32_t, int64_t>(false, 4294967293);
testSumOverflow<int64_t, int64_t>(true, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like overflow is tested only for 64-bit integers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making the test a bit more generic to test limits

// instead when floating points go over limit.
testSumOverflow<int8_t, int64_t>(false, 253);
testSumOverflow<int16_t, int64_t>(false, 65533);
testSumOverflow<int32_t, int64_t>(false, 4294967293);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be cleaner to use std::numeric_limits instead of hard-coded constants.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@bikramSingh91
Copy link
Contributor Author

rebasing in next patch

@facebook-github-bot
Copy link
Contributor

@bikramSingh91 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@kagamiori kagamiori left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. @mbasmanova Do you have further suggestions?

@kagamiori kagamiori requested a review from mbasmanova July 25, 2022 19:39
data.push_back(makeRowVector({makeFlatVector<InputType>(input)}));

// Testing these two steps provides enough coverage. Adding kfinal
// involves more elaborate multi-fragment setup which would be an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not necessary to setup multi-fragment query to test final aggregation. Would you consider using AggregationTestBase::testAggregations instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing me to AggregationTestBase::testAggregations() this looks super useful to exhaustively test all aggregation node combinations.
For this test however it might be an overkill since we would only want to make sure that overflow gets caught for each step individually and chaining steps wont provide additional coverage.

To provide more context for my code comment above: I wanted to test the kfinal step with an input of 2 values that when added cause an overflow. (Please correct me if I am wrong) Since this is a global aggregation any previous aggregation (partial) step that I add to the same query fragment would result in hitting the overflow in that step itself. Therefore to make sure that i hit the overflow in the kFinal step the only way I could think of was to create 2 separate input fragments that both output a single value, which when aggregates at the kFinal step and hits an overflow.

Something akin to this looks promising but seems like its only run when grouping keys exist: https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/aggregates/tests/AggregationTestBase.cpp#L176-L196
For my test case specifically I can mimic the same steps above and try putting the 2 input values in 2 separate rowVectors and test whether they get plugged in through separate drivers feeding into a final agg. I'll report back if that works.

@facebook-github-bot
Copy link
Contributor

@bikramSingh91 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@bikramSingh91 bikramSingh91 requested a review from mbasmanova July 28, 2022 02:20
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bikramSingh91 Looks great % one question.

std::is_same<TOutput, float>::value) {
result += n * value;
} else {
result = functions::checkedPlus<TOutput>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this code path covered by the test? It looks like it might not be. To cover it use constant vector as input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbasmanova Thanks for catching that. I updated the test to check the duplicates path, both for overflow in the add and in the multiply. I however removed the verification on results for the cases where overflow does not occur to avoid bloating the test code as the newly added cases would have required separate calculation of expected results per test case. Please let me know if you think that works. thank you

This patch adds changes to check for overflow on every update
operation in the sum aggregate function. This is only implemented
for integer types as floating points get set to infinity once they
overflow, which is a valid result.

Test Plan:
Verified that this causes no performance regression by running
existing aggregation benchmark.
Also added a unit test for the same.
@facebook-github-bot
Copy link
Contributor

@bikramSingh91 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@majetideepak
Copy link
Collaborator

@bikramSingh91 , @mbasmanova Looks like my PR #2163 is related to this and should probably land after this change. Is this going to land soon? Thanks.

@bikramSingh91
Copy link
Contributor Author

@majetideepak Yes this should land soon. Either today or tomorrow at the very latest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants