New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Check for overflow in sum aggregate function #2010

Closed

bikramSingh91 wants to merge 1 commit into facebookincubator:main from bikramSingh91:SumAgg

Contributor

bikramSingh91 commented Jul 14, 2022

This patch adds changes to check for overflow on every update
operation in the sum aggregate function. This is only implemented
for integer types as floating points get set to infinity once they
overflow, which is a valid result.

Test Plan:
Verified that this causes no performance regression by running
existing aggregation benchmark.
Also added a unit test for the same.

bikramSingh91 requested a review from mbasmanova

July 14, 2022 00:30

facebook-github-bot added the CLA Signed label

Contributor

facebook-github-bot commented Jul 14, 2022

@bikramSingh91 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

mbasmanova reviewed

View reviewed changes

Contributor

mbasmanova left a comment

@bikramSingh91 Looks good % some comments on the test.

velox/exec/tests/AggregationTest.cpp Outdated

@@ @@ -484,6 +485,27 @@ class AggregationTest : public OperatorTestBase { @@
                          DOUBLE(),
                          VARCHAR()})};
                 folly::Random::DefaultGenerator rng_;
+                template <typename InputType, typename ResultType>
+                void testSumOverflow(bool expectError, const ResultType expectedResult) {

Contributor

mbasmanova Jul 19, 2022

Let's move this test to velox/functions/prestosql/aggregates/tests/SumTest.cpp

Contributor Author

bikramSingh91 Jul 21, 2022

done

velox/exec/tests/AggregationTest.cpp Outdated

+                testSumOverflow<int16_t, int64_t>(false, 65533);
+                testSumOverflow<int32_t, int64_t>(false, 4294967293);
+                testSumOverflow<int64_t, int64_t>(true, 0);
+                // TODO: add this back once sum agg for floats is fixed.

Contributor

mbasmanova Jul 19, 2022

What is the problem here? Can it be fixed in this PR or does it require a separate PR?

Contributor Author

bikramSingh91 Jul 21, 2022

created an issue and added the reference as a comment for more context on the issue

velox/exec/tests/AggregationTest.cpp Outdated

+                template <typename InputType, typename ResultType>
+                void testSumOverflow(bool expectError, const ResultType expectedResult) {
+                  auto expectedVector =

Contributor

mbasmanova Jul 19, 2022

Since expectedVector is used only if expectError is false, it might be more readable to move it to the 'else' branch below.

Also, I wonder if it would be cleaner to have this method test only overflow cases.

Contributor Author

bikramSingh91 Jul 21, 2022

both cases which overflow and dont have very similar setup and code which is why I merged them. Changing the naming a bit to sound more generic. Please let me know if this helps readability or if you prefer splitting them.

velox/exec/tests/AggregationTest.cpp Outdated

+                testSumOverflow<int8_t, int64_t>(false, 253);
+                testSumOverflow<int16_t, int64_t>(false, 65533);
+                testSumOverflow<int32_t, int64_t>(false, 4294967293);
+                testSumOverflow<int64_t, int64_t>(true, 0);

Contributor

mbasmanova Jul 19, 2022

Looks like overflow is tested only for 64-bit integers.

Contributor Author

bikramSingh91 Jul 21, 2022

making the test a bit more generic to test limits

velox/exec/tests/AggregationTest.cpp Outdated

+                // instead when floating points go over limit.
+                testSumOverflow<int8_t, int64_t>(false, 253);
+                testSumOverflow<int16_t, int64_t>(false, 65533);
+                testSumOverflow<int32_t, int64_t>(false, 4294967293);

Contributor

mbasmanova Jul 19, 2022

It might be cleaner to use std::numeric_limits instead of hard-coded constants.

Contributor Author

bikramSingh91 Jul 21, 2022

done

bikramSingh91 force-pushed the SumAgg branch from bb32e7f to 54c4f11 Compare

July 21, 2022 23:36

Contributor Author

bikramSingh91 commented Jul 21, 2022

rebasing in next patch

bikramSingh91 force-pushed the SumAgg branch from 54c4f11 to 089437a Compare

July 21, 2022 23:37

Contributor

facebook-github-bot commented Jul 21, 2022

@bikramSingh91 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

kagamiori approved these changes

View reviewed changes

Contributor

kagamiori left a comment

Looks good to me. @mbasmanova Do you have further suggestions?

kagamiori requested a review from mbasmanova

July 25, 2022 19:39

mbasmanova reviewed

View reviewed changes

velox/functions/prestosql/aggregates/tests/SumTest.cpp Outdated

+                    data.push_back(makeRowVector({makeFlatVector<InputType>(input)}));
+                    // Testing these two steps provides enough coverage. Adding kfinal
+                    // involves more elaborate multi-fragment setup which would be an

Contributor

mbasmanova Jul 25, 2022

It is not necessary to setup multi-fragment query to test final aggregation. Would you consider using AggregationTestBase::testAggregations instead?

Contributor Author

bikramSingh91 Jul 25, 2022

Thank you for pointing me to AggregationTestBase::testAggregations() this looks super useful to exhaustively test all aggregation node combinations.
For this test however it might be an overkill since we would only want to make sure that overflow gets caught for each step individually and chaining steps wont provide additional coverage.

To provide more context for my code comment above: I wanted to test the kfinal step with an input of 2 values that when added cause an overflow. (Please correct me if I am wrong) Since this is a global aggregation any previous aggregation (partial) step that I add to the same query fragment would result in hitting the overflow in that step itself. Therefore to make sure that i hit the overflow in the kFinal step the only way I could think of was to create 2 separate input fragments that both output a single value, which when aggregates at the kFinal step and hits an overflow.

Something akin to this looks promising but seems like its only run when grouping keys exist: https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/aggregates/tests/AggregationTestBase.cpp#L176-L196
For my test case specifically I can mimic the same steps above and try putting the 2 input values in 2 separate rowVectors and test whether they get plugged in through separate drivers feeding into a final agg. I'll report back if that works.

bikramSingh91 force-pushed the SumAgg branch from 089437a to ded9911 Compare

July 28, 2022 02:18

Contributor

facebook-github-bot commented Jul 28, 2022

@bikramSingh91 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

bikramSingh91 requested a review from mbasmanova

July 28, 2022 02:20

mbasmanova approved these changes

View reviewed changes

Contributor

mbasmanova left a comment

@bikramSingh91 Looks great % one question.

velox/functions/prestosql/aggregates/SumAggregate.h

+                      std::is_same<TOutput, float>::value) {
+                    result += n * value;
+                  } else {
+                    result = functions::checkedPlus<TOutput>(

Contributor

mbasmanova Jul 28, 2022

Is this code path covered by the test? It looks like it might not be. To cover it use constant vector as input.

Contributor Author

bikramSingh91 Jul 28, 2022

@mbasmanova Thanks for catching that. I updated the test to check the duplicates path, both for overflow in the add and in the multiply. I however removed the verification on results for the cases where overflow does not occur to avoid bloating the test code as the newly added cases would have required separate calculation of expected results per test case. Please let me know if you think that works. thank you


          Check for overflow in sum aggregate function

a49ed74

This patch adds changes to check for overflow on every update
operation in the sum aggregate function. This is only implemented
for integer types as floating points get set to infinity once they
overflow, which is a valid result.

Test Plan:
Verified that this causes no performance regression by running
existing aggregation benchmark.
Also added a unit test for the same.

bikramSingh91 force-pushed the SumAgg branch from ded9911 to a49ed74 Compare

July 28, 2022 19:09

Contributor

facebook-github-bot commented Jul 28, 2022

@bikramSingh91 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

bikramSingh91 requested a review from mbasmanova

August 1, 2022 15:59

mbasmanova mentioned this pull request

Add support for Decimal Type Sum Aggregation #2163

Closed

Collaborator

majetideepak commented Aug 2, 2022

@bikramSingh91 , @mbasmanova Looks like my PR #2163 is related to this and should probably land after this change. Is this going to land soon? Thanks.

Contributor Author

bikramSingh91 commented Aug 2, 2022

@majetideepak Yes this should land soon. Either today or tomorrow at the very latest.

facebook-github-bot closed this in

865d702

frankobe mentioned this pull request

Performance Regression on Arithmetic operation in velox/common/base/CheckedArithmetic.h #2684

Open

c404err mentioned this pull request

performance concerns of __builtin_add_overflow() in velox/common/base/CheckedArithmetic.h #2804

Open

marin-ma pushed a commit to marin-ma/velox-oap that referenced this pull request


          [GLUTEN-CORE][VL] Minor refactor c2r codes to improve readability (fa…

9daeafe

…cebookincubator#2010)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels