Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip max_buckets test when it is flaky #58038

Merged
merged 3 commits into from
Jun 12, 2020
Merged

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Jun 12, 2020

Before #57042 the max_buckets test would consistently pass because the
request would consistently fail. In particular, the request would fail on
the data node. After #57042 it only fails on the coordinating node. When
the max_buckets test is run in a mixed version cluster it consistently
fails on either the data node or the coordinating node. Except when
the coordinating node is missing #43095. In that case if the one data
node has #57042 and one does not, and the one that doesn't gets the
request first, fails it as expected, and then the coordinating node
retries the request on the node with #57042. When that happens the
request fails mysteriously with "partial shard failures" as the error
message but not partial failures reported. This is exactly the bug
fixed in #43095.

This updates the test to be skipped in mixed version clusters without
#43095 because they sometimes fail the test spuriously. The request
fails in those cases, just like we expect, but with a mysterious error
message.

Closes #57657

Before elastic#57042 the max_buckets test would consistently pass because the
request would consistently fail. In particular, the request would fail on
the data node. After elastic#57042 it only fails on the coordinating node. When
the max_buckets test is run in a mixed version cluster it consistently
fails on *either* the data node or the coordinating node. Except when
the coordinating node is missing elastic#43095. In that case if the one data
node has elastic#57042 and one does not, *and* the one that doesn't gets the
request first, fails it as expected, and then the coordinating node
retries the request on the node with elastic#57042. When that happens the
request fails mysteriously with "partial shard failures" as the error
message but not partial failures reported. This is *exactly* the bug
fixed in elastic#43095.

This updates the test to be skipped in mixed version clusters without
 elastic#43095 because they *sometimes* fail the test spuriously. The request
fails in those cases, just like we expect, but with a mysterious error
message.

Closes elastic#57657
@nik9000 nik9000 added >test Issues or PRs that are addressing/adding tests :Analytics/Aggregations Aggregations v7.9.0 labels Jun 12, 2020
@nik9000 nik9000 requested a review from imotov June 12, 2020 12:00
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 12, 2020
@nik9000
Copy link
Member Author

nik9000 commented Jun 12, 2020

Note: This targets the 7.x branch because the failure only occurs there. There is no need to land this in master.

Copy link
Contributor

@imotov imotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was quite a brain teaser. Thanks a lot for digging into it!

@nik9000
Copy link
Member Author

nik9000 commented Jun 12, 2020

run elasticsearch-ci/packaging-sample-matrix-windows

1 similar comment
@nik9000
Copy link
Member Author

nik9000 commented Jun 12, 2020

run elasticsearch-ci/packaging-sample-matrix-windows

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >test Issues or PRs that are addressing/adding tests v7.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants