Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kRange preceding/following frames in window fuzzer #10006

Closed

Conversation

pramodsatya
Copy link
Collaborator

@pramodsatya pramodsatya commented Jun 2, 2024

  1. Adds support for kRange preceding/following frames in window fuzzer
  2. Adds reference query runner context for PrestoSql frame clause for kRange
    preceding/following frames.

Resolves #9572 .

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 2, 2024
Copy link

netlify bot commented Jun 2, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 906d532
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6700320b91067500082ab351

@pramodsatya pramodsatya changed the title [WIP] Add reference query runner context to window fuzzer Add reference query runner context to window fuzzer Jun 11, 2024
@pramodsatya
Copy link
Collaborator Author

Hi @aditi-pandit, could you please help review these changes? The second commit contains changes specific to query runner context.

Copy link
Contributor

@kagamiori kagamiori left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pramodsatya, thank you for putting together this draft! I left some comments. Have you tried whether the current code already work properly with PrestoQueryRunner?

velox/exec/fuzzer/AggregationFuzzerBase.cpp Outdated Show resolved Hide resolved
velox/exec/tests/utils/QueryAssertions.cpp Outdated Show resolved Hide resolved
velox/exec/fuzzer/WindowFuzzer.cpp Outdated Show resolved Hide resolved
velox/exec/fuzzer/WindowFuzzer.cpp Outdated Show resolved Hide resolved
velox/exec/fuzzer/AggregationFuzzerBase.cpp Outdated Show resolved Hide resolved
Comment on lines 128 to 205
if constexpr (std::is_same_v<T, double> || std::is_same_v<T, float>) {
return offsetCol[idx] - offsetValue;
} else {
return checkedMinus<T>(offsetCol[idx], offsetValue);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This piece of code is similar to lines 139-143. Could we reuse the code?

velox/exec/fuzzer/WindowFuzzer.cpp Outdated Show resolved Hide resolved
velox/exec/fuzzer/WindowFuzzer.cpp Outdated Show resolved Hide resolved
velox/exec/fuzzer/WindowFuzzer.cpp Outdated Show resolved Hide resolved
velox/exec/fuzzer/WindowFuzzer.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator Author

@pramodsatya pramodsatya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @kagamiori. We found some result mismatches with Presto while testing with these changes and are investigating them further.

velox/exec/fuzzer/AggregationFuzzerBase.cpp Outdated Show resolved Hide resolved
velox/exec/fuzzer/AggregationFuzzerBase.cpp Outdated Show resolved Hide resolved
velox/exec/fuzzer/ReferenceQueryRunner.h Outdated Show resolved Hide resolved
velox/exec/fuzzer/WindowFuzzer.cpp Outdated Show resolved Hide resolved
velox/exec/fuzzer/WindowFuzzer.cpp Outdated Show resolved Hide resolved
velox/exec/fuzzer/WindowFuzzer.cpp Outdated Show resolved Hide resolved
velox/exec/fuzzer/WindowFuzzer.cpp Outdated Show resolved Hide resolved
velox/exec/tests/utils/QueryAssertions.cpp Outdated Show resolved Hide resolved
@pramodsatya pramodsatya force-pushed the query_runner_ctx branch 3 times, most recently from 93ddd89 to 09e547b Compare June 28, 2024 03:01
@pramodsatya pramodsatya requested a review from aditi-pandit June 28, 2024 03:01
@pramodsatya pramodsatya marked this pull request as ready for review June 28, 2024 03:04
Copy link
Collaborator

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pramodsatya. This is beginning to look solid.

velox/exec/fuzzer/AggregationFuzzerBase.cpp Outdated Show resolved Hide resolved
velox/exec/fuzzer/AggregationFuzzerBase.cpp Outdated Show resolved Hide resolved
velox/exec/fuzzer/AggregationFuzzerBase.cpp Outdated Show resolved Hide resolved
velox/exec/fuzzer/ReferenceQueryRunner.h Outdated Show resolved Hide resolved
velox/exec/fuzzer/WindowFuzzer.cpp Outdated Show resolved Hide resolved
velox/exec/fuzzer/WindowFuzzer.cpp Outdated Show resolved Hide resolved
velox/exec/fuzzer/WindowFuzzer.cpp Outdated Show resolved Hide resolved
std::is_same_v<T, Timestamp> ||
std::is_same_v<T, UnknownValue>)) {
auto size = vectorFuzzer_.getOptions().vectorSize;
velox::test::VectorMaker vectorMaker{pool_.get()};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need vectorMaker for the rowVector at the end ? It seems un-necessary.

newNames.push_back(columnName);
auto newChildren = input[i]->children();
newChildren.push_back(offsetColumn);
input[i] = vectorMaker.rowVector(newNames, newChildren);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use makeRowVector here. Do we really need vectorMaker ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is in VectorTestBase and we don't derive from this class, vectorMaker is also used in AggregationFuzzerBase::generateInputData. Creating a RowVectorPtr with std::make_shared requires more changes and vectorMaker looks more convenient here. Could you please share if this is fine?

velox/exec/fuzzer/WindowFuzzer.cpp Outdated Show resolved Hide resolved
@pramodsatya
Copy link
Collaborator Author

Hi @aditi-pandit, @kagamiori, could you please help review this PR?

Copy link
Collaborator

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pramodsatya.Only few minor comments left.

velox/exec/fuzzer/AggregationFuzzerBase.cpp Outdated Show resolved Hide resolved
velox/exec/fuzzer/WindowFuzzer.h Outdated Show resolved Hide resolved
velox/exec/fuzzer/WindowFuzzer.cpp Outdated Show resolved Hide resolved
velox/exec/fuzzer/WindowFuzzer.cpp Outdated Show resolved Hide resolved
@aditi-pandit aditi-pandit changed the title Add reference query runner context to window fuzzer Add kRange preceding/following frames in window fuzzer Jul 23, 2024
@pramodsatya
Copy link
Collaborator Author

Thanks @pramodsatya.Only few minor comments left.

Thanks @aditi-pandit, addressed the comments.

@aditi-pandit
Copy link
Collaborator

@pramodsatya : Please rebase your code. There is a conflict.

Copy link
Collaborator

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pramodsatya

@kagamiori
Copy link
Contributor

Hi @pramodsatya, is that failure still reproducible? If so, could you share which command you used to reproduce it? Thanks!

@pramodsatya
Copy link
Collaborator Author

Hi @pramodsatya, is that failure still reproducible? If so, could you share which command you used to reproduce it? Thanks!

Hi @kagamiori, the failure is not reproducible currently. It can be reproduced by reverting the changes to WindowPartition.cpp in this PR, and with the following command:

velox_window_fuzzer_test --enable_window_reference_verification --presto_url="http://127.0.0.1:8080" --duration_sec=3600 --logtostderr=1 --minloglevel=0 --seed=2483741532 --req_timeout_ms=5000

Copy link
Contributor

@kagamiori kagamiori left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pramodsatya, thank you for investigating the fuzzer failure! I can reproduce that bug, and as you explained, it was due to an incorrect use of inputMapping_ in WindowPartition::updateKRangeFrameBounds(). It's a great demonstration that this fuzzer enhancement enables us to catch a real bug.

Since this PR is already big, I wonder if we could separate the bug fix into another PR and add a unit test with that?

velox/exec/WindowPartition.cpp Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@pramodsatya
Copy link
Collaborator Author

Hi @pramodsatya, thank you for investigating the fuzzer failure! I can reproduce that bug, and as you explained, it was due to an incorrect use of inputMapping_ in WindowPartition::updateKRangeFrameBounds(). It's a great demonstration that this fuzzer enhancement enables us to catch a real bug.

Since this PR is already big, I wonder if we could separate the bug fix into another PR and add a unit test with that?

Thanks for the feedback @kagamiori, addressed the comments and opened another PR for the null check fix: #11075 . Could you please take another look?

@facebook-github-bot
Copy link
Contributor

@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@pramodsatya
Copy link
Collaborator Author

Hi @kagamiori, after the rebase, window fuzzer is failing with kRange frames when alternative plans are tested, I'm looking into it:

-- Window[2][STREAMING partition by [p0, p1] order by [s0 DESC NULLS FIRST] w0 := checksum(ROW["c0"]) RANGE between off0 FOLLOWING and off1 FOLLOWING] -> c0:INTERVAL DAY TO SECOND, p0:INTERVAL DAY TO SECOND, p1:BOOLEAN, s0:REAL, row_number:BIGINT, off0:REAL, k1:REAL, off1:REAL, w0:VARBINARY
  -- OrderBy[1][p0 ASC NULLS FIRST, p1 ASC NULLS FIRST, s0 DESC NULLS FIRST] -> c0:INTERVAL DAY TO SECOND, p0:INTERVAL DAY TO SECOND, p1:BOOLEAN, s0:REAL, row_number:BIGINT, off0:REAL, k1:REAL, off1:REAL
    -- Values[0][1000 rows in 10 vectors] -> c0:INTERVAL DAY TO SECOND, p0:INTERVAL DAY TO SECOND, p1:BOOLEAN, s0:REAL, row_number:BIGINT, off0:REAL, k1:REAL, off1:REAL

Expected 1000, got 1000
1 extra rows, 1 missing rows
1 of extra rows:
        2933652544787757131 | 4798976694194643045 | true | "Infinity" | 420 | "Infinity" | "NaN" | "NaN" | "Zw90+g6vltk="

1 of missing rows:
        2933652544787757131 | 4798976694194643045 | true | "Infinity" | 420 | "Infinity" | "NaN" | "NaN" | null

Copy link
Contributor

@kagamiori kagamiori left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for adding this support, iterating on this PR, and fixing the fuzzer-found bug!

velox/vector/fuzzer/VectorFuzzer.h Outdated Show resolved Hide resolved
@pramodsatya
Copy link
Collaborator Author

Hi @kagamiori, after the rebase, window fuzzer is failing with kRange frames when alternative plans are tested, I'm looking into it:

-- Window[2][STREAMING partition by [p0, p1] order by [s0 DESC NULLS FIRST] w0 := checksum(ROW["c0"]) RANGE between off0 FOLLOWING and off1 FOLLOWING] -> c0:INTERVAL DAY TO SECOND, p0:INTERVAL DAY TO SECOND, p1:BOOLEAN, s0:REAL, row_number:BIGINT, off0:REAL, k1:REAL, off1:REAL, w0:VARBINARY
  -- OrderBy[1][p0 ASC NULLS FIRST, p1 ASC NULLS FIRST, s0 DESC NULLS FIRST] -> c0:INTERVAL DAY TO SECOND, p0:INTERVAL DAY TO SECOND, p1:BOOLEAN, s0:REAL, row_number:BIGINT, off0:REAL, k1:REAL, off1:REAL
    -- Values[0][1000 rows in 10 vectors] -> c0:INTERVAL DAY TO SECOND, p0:INTERVAL DAY TO SECOND, p1:BOOLEAN, s0:REAL, row_number:BIGINT, off0:REAL, k1:REAL, off1:REAL

Expected 1000, got 1000
1 extra rows, 1 missing rows
1 of extra rows:
        2933652544787757131 | 4798976694194643045 | true | "Infinity" | 420 | "Infinity" | "NaN" | "NaN" | "Zw90+g6vltk="

1 of missing rows:
        2933652544787757131 | 4798976694194643045 | true | "Infinity" | 420 | "Infinity" | "NaN" | "NaN" | null

Hi @kagamiori, we investigated this error and it seems to be because of the recently added changes which enable the vector fuzzer to generate Nan and Infinity values. Nan and Infinity values are now accounted for when constructing the frame offset column and the error is fixed now.
Could you please help review this fix? will squash to a single commit if the change looks good. Thanks!
cc: @minhancao

Copy link
Contributor

@kagamiori kagamiori left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pramodsatya, I left one comment. The other part of this fix looks good to me. Thanks!

velox/exec/fuzzer/WindowFuzzer.cpp Outdated Show resolved Hide resolved
Co-authored-by: Minhan Cao <mcao@ibm.com>
@pramodsatya
Copy link
Collaborator Author

Hi @pramodsatya, I left one comment. The other part of this fix looks good to me. Thanks!

Thanks for the suggestion @kagamiori, updated accordingly. Could you please help merge this change?

@facebook-github-bot
Copy link
Contributor

@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@kagamiori
Copy link
Contributor

kagamiori commented Oct 8, 2024

Hi @pramodsatya, FYI, there was an internal linter complaint about defining a static variable in the header file VectorFuzzer.h. So I reverted the change in VectorFuzzer.h and VectorFuzzer.cpp, but instead declare the function const std::vector<TypePtr>& defaultScalarTypes() in VectorFuzzer.h so that AggregationFuzzerBase.cpp can use defaultScalarTypes(). I have checked there was no build issue on my Mac and Linux.

I'm going to start the merging process now.

@pramodsatya
Copy link
Collaborator Author

Hi @pramodsatya, FYI, there was an internal linter complaint about defining a static variable in the header file VectorFuzzer.h. So I reverted the change in VectorFuzzer.h and VectorFuzzer.cpp, but instead declare the function const std::vector<TypePtr>& defaultScalarTypes() in VectorFuzzer.h so that AggregationFuzzerBase.cpp can use defaultScalarTypes(). I have checked there was no build issue on my Mac and Linux.

I'm going to start the merging process now.

Thanks for the update @kagamiori, sounds good. Please let me know if any other changes are needed.

@facebook-github-bot
Copy link
Contributor

@kagamiori merged this pull request in ce035c8.

Copy link

Conbench analyzed the 1 benchmark run on commit ce035c87.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

@Yuhta
Copy link
Contributor

Yuhta commented Oct 10, 2024

@pramodsatya Window fuzzer has been broken since this change, can you take a look?
https://github.com/facebookincubator/velox/actions/runs/11226206459/job/31207444517

@kagamiori
Copy link
Contributor

@pramodsatya Window fuzzer has been broken since this change, can you take a look? https://github.com/facebookincubator/velox/actions/runs/11226206459/job/31207444517

@Yuhta, I'm taking a look at this failure: #11213.

athmaja-n pushed a commit to athmaja-n/velox that referenced this pull request Jan 10, 2025
…ator#10006)

Summary:
1. Adds support for kRange preceding/following frames in window fuzzer
2. Adds [reference query runner context](https://ibm.box.com/s/9tuk22hfp13imjwq9xelnz1dogsdgdh1) for PrestoSql frame clause for kRange
preceding/following frames.

Resolves facebookincubator#9572 .

Pull Request resolved: facebookincubator#10006

Reviewed By: xiaoxmeng

Differential Revision: D61882004

Pulled By: kagamiori

fbshipit-source-id: 343201fc36e5c3c01779d73ff0888cd0597ba13c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enhance WindowFuzzer to support k-range and k-rows frames with column boundary
5 participants