-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip max_buckets test when it is flaky #58038
Conversation
Before elastic#57042 the max_buckets test would consistently pass because the request would consistently fail. In particular, the request would fail on the data node. After elastic#57042 it only fails on the coordinating node. When the max_buckets test is run in a mixed version cluster it consistently fails on *either* the data node or the coordinating node. Except when the coordinating node is missing elastic#43095. In that case if the one data node has elastic#57042 and one does not, *and* the one that doesn't gets the request first, fails it as expected, and then the coordinating node retries the request on the node with elastic#57042. When that happens the request fails mysteriously with "partial shard failures" as the error message but not partial failures reported. This is *exactly* the bug fixed in elastic#43095. This updates the test to be skipped in mixed version clusters without elastic#43095 because they *sometimes* fail the test spuriously. The request fails in those cases, just like we expect, but with a mysterious error message. Closes elastic#57657
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations) |
Note: This targets the 7.x branch because the failure only occurs there. There is no need to land this in master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was quite a brain teaser. Thanks a lot for digging into it!
run elasticsearch-ci/packaging-sample-matrix-windows |
1 similar comment
run elasticsearch-ci/packaging-sample-matrix-windows |
Before #57042 the max_buckets test would consistently pass because the
request would consistently fail. In particular, the request would fail on
the data node. After #57042 it only fails on the coordinating node. When
the max_buckets test is run in a mixed version cluster it consistently
fails on either the data node or the coordinating node. Except when
the coordinating node is missing #43095. In that case if the one data
node has #57042 and one does not, and the one that doesn't gets the
request first, fails it as expected, and then the coordinating node
retries the request on the node with #57042. When that happens the
request fails mysteriously with "partial shard failures" as the error
message but not partial failures reported. This is exactly the bug
fixed in #43095.
This updates the test to be skipped in mixed version clusters without
#43095 because they sometimes fail the test spuriously. The request
fails in those cases, just like we expect, but with a mysterious error
message.
Closes #57657