-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] CoordinatorTests testUnhealthyLeaderIsReplaced failing #90158
Comments
Pinging @elastic/es-distributed (Team:Distributed) |
This reminds me of #89867 's error, but it's a different test. |
Mute flaky test `CoordinatorTests#testUnhealthyLeaderIsReplaced` Relates to #90158
This has been showing frequently in last 7 days. I believe the solution of #89867 which was just merged may solve this as well. I will monitor the related build failures in next 7 days to see if it was solved. |
Oh I may need to re-enable the test, will make a PR for it. |
I just tried to reproduce this, and it still fails in main. So the fix for #89867 does not seem related. I will handle this in due time. |
So, I've managed to figure out which PR introduced the issue by running many iterations of the test. Specifically it seems to be this line of this PR. If I change the line to The main difference the PR seems to have introduced is that previously we were making a copy of the Metadata, while now we may be returning the same instance of Metadata, but I am not sure if that's an issue or not. @original-brownbear since it seems you authored that PR, might you understand why it makes the test fail? |
@kingherc hard to say how this got broken by my change without digging into it deeper. I can see a couple of spots where metadata instance equality is checked in |
The cluster coordination consistency layer relies on a couple of fields within `Metadata` which record the last _committed_ values on each node. In contrast, the rest of the cluster state can only be changed at _accept_ time. In the past we would copy these fields over from the master on every publication, but since #90101 we don't copy anything at all if the `Metadata` is unchanged on the master. However, the master computes the diff against the last _committed_ state whereas the receiving nodes apply the diff to the last _accepted_ state, and this means if the master sends a no-op `Metadata` diff then the receiving node will revert its last-committed values to the ones included in the state it last accepted. With this commit we include the last-committed values alongside the cluster state diff so that they are always copied properly. Closes #90158
The cluster coordination consistency layer relies on a couple of fields within `Metadata` which record the last _committed_ values on each node. In contrast, the rest of the cluster state can only be changed at _accept_ time. In the past we would copy these fields over from the master on every publication, but since elastic#90101 we don't copy anything at all if the `Metadata` is unchanged on the master. However, the master computes the diff against the last _committed_ state whereas the receiving nodes apply the diff to the last _accepted_ state, and this means if the master sends a no-op `Metadata` diff then the receiving node will revert its last-committed values to the ones included in the state it last accepted. With this commit we include the last-committed values alongside the cluster state diff so that they are always copied properly. Closes elastic#90158 Backport of elastic#92259 to 8.6
The cluster coordination consistency layer relies on a couple of fields within `Metadata` which record the last _committed_ values on each node. In contrast, the rest of the cluster state can only be changed at _accept_ time. In the past we would copy these fields over from the master on every publication, but since #90101 we don't copy anything at all if the `Metadata` is unchanged on the master. However, the master computes the diff against the last _committed_ state whereas the receiving nodes apply the diff to the last _accepted_ state, and this means if the master sends a no-op `Metadata` diff then the receiving node will revert its last-committed values to the ones included in the state it last accepted. With this commit we include the last-committed values alongside the cluster state diff so that they are always copied properly. Closes #90158 Backport of #92259 to 8.6
Build scan:
https://gradle-enterprise.elastic.co/s/j4xfvozocqmng/tests/:server:test/org.elasticsearch.cluster.coordination.CoordinatorTests/testUnhealthyLeaderIsReplaced
Reproduction line:
./gradlew ':server:test' --tests "org.elasticsearch.cluster.coordination.CoordinatorTests.testUnhealthyLeaderIsReplaced" -Dtests.seed=3E136AAE5154F966 -Dtests.locale=ro -Dtests.timezone=Asia/Aqtau -Druntime.java=17
Applicable branches:
main
Reproduces locally?:
Didn't try
Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.cluster.coordination.CoordinatorTests&tests.test=testUnhealthyLeaderIsReplaced
Failure excerpt:
The text was updated successfully, but these errors were encountered: