-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include last-committed data in publication #92259
Include last-committed data in publication #92259
Conversation
Alternative to #92258 that doesn't involve changing the safety algorithm. Not as bad as I thought it would be. Repeating here: I think this bug is more of a liveness concern than a safety one: it means that an election may incorrectly wait to collect votes from the previous configuration as well as the current one.
|
The cluster coordination consistency layer relies on a couple of fields within `Metadata` which record the last _committed_ values on each node. In contrast, the rest of the cluster state can only be changed at _accept_ time. In the past we would copy these fields over from the master on every publication, but since elastic#90101 we don't copy anything at all if the `Metadata` is unchanged on the master. However, the master computes the diff against the last _committed_ state whereas the receiving nodes apply the diff to the last _accepted_ state, and this means if the master sends a no-op `Metadata` diff then the receiving node will revert its last-committed values to the ones included in the state it last accepted. With this commit we include the last-committed values alongside the cluster state diff so that they are always copied properly.
173d526
to
ad1574f
Compare
Hi @DaveCTurner, I've created a changelog YAML for you. |
Pinging @elastic/es-distributed (Team:Distributed) |
We should backport this to 8.6, but I'm unsure if it'll land in 8.6.0 or 8.6.1. It's not serious enough to call it a blocker for 8.6.0 IMO. However the 8.6.1 version constant doesn't exist yet so I am going to have to set the BwC version to 8.6.0 while backporting, and then we may have to adjust this when bumping versions after the 8.6.0 release. cc @henningandersen who may have a different opinion about calling this a blocker for 8.6.0. |
Hi @DaveCTurner, I've updated the changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments, probably to be considered questions.
final var hasClusterUuid = lastAcceptedState.metadata().clusterUUID().equals(Metadata.UNKNOWN_CLUSTER_UUID) == false; | ||
assert hasClusterUuid : "received cluster state with empty cluster uuid: " + lastAcceptedState; | ||
|
||
if (hasClusterUuid && lastAcceptedState.metadata().clusterUUIDCommitted() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels odd to protect the logging by something we assert on above?
if (hasClusterUuid && lastAcceptedState.metadata().clusterUUIDCommitted() == false) { | |
if (lastAcceptedState.metadata().clusterUUIDCommitted() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ehh I just kept the same logic as before but I don't think I've ever seen this assertion trip. I'd be ok to drop it.
setLastAcceptedState(ClusterState.builder(lastAcceptedState).metadata(metadataBuilder).build()); | ||
|
||
final var adjustedMetadata = lastAcceptedState.metadata() | ||
.withLastCommittedValues(hasClusterUuid, lastAcceptedState.getLastAcceptedConfiguration()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we pass in hasClusterUuid || lastAcceptedState.metadata().clusterUUIDCommitted()
here? I mean, we are outside assertions already, so could also pass in true. But assuming assertions have some case we are trying to protect against, passing in the original committed value seems safer and also in line with original code.
@@ -556,27 +556,18 @@ public interface PersistedState extends Closeable { | |||
* marked as committed. | |||
*/ | |||
default void markLastAcceptedStateAsCommitted() { | |||
final ClusterState lastAcceptedState = getLastAcceptedState(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code is cleaner, but I failed to spot the important difference, is it just the logging you wanted out of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this does anything different, there's just no need to use a Metadata.Builder
here any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. In that case, let us revert this and move this to a follow-up, where we can also assume that assertions are held (i.e., use true
instead of hasClusterUuid
later in the code).
@@ -86,6 +86,8 @@ public class PublicationTransportHandler { | |||
TransportRequestOptions.Type.STATE | |||
); | |||
|
|||
public static final Version INCLUDES_LAST_COMMITTED_DATA_VERSION = Version.V_8_7_0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DaveCTurner should this be 8.6.0 since it's intended to be backported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll adjust that while backporting. If I set this to 8.6.0 now then the BwC test suite would fail on this PR, but it's important for those tests to pass because they tell us whether there are any other more subtle BwC problems here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Joining the dots here:
- I opened a PR at Include last-committed data in publication #92277 which backports this to 8.6 with a fixed BwC version, and waited for it to pass CI.
- I merged 296eacd which disables the BwC tests in
main
. - I merged the backport PR.
- I opened another PR at Fixup BWC after backport of #92259 #92280 to fix up the BwC version in
main
and reinstate the BwC tests, which will be merged once it passes CI.
I reverted the tidy-up to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
...r/src/test/java/org/elasticsearch/cluster/coordination/PublicationTransportHandlerTests.java
Outdated
Show resolved
Hide resolved
The cluster coordination consistency layer relies on a couple of fields within `Metadata` which record the last _committed_ values on each node. In contrast, the rest of the cluster state can only be changed at _accept_ time. In the past we would copy these fields over from the master on every publication, but since elastic#90101 we don't copy anything at all if the `Metadata` is unchanged on the master. However, the master computes the diff against the last _committed_ state whereas the receiving nodes apply the diff to the last _accepted_ state, and this means if the master sends a no-op `Metadata` diff then the receiving node will revert its last-committed values to the ones included in the state it last accepted. With this commit we include the last-committed values alongside the cluster state diff so that they are always copied properly. Closes elastic#90158 Backport of elastic#92259 to 8.6
The cluster coordination consistency layer relies on a couple of fields within `Metadata` which record the last _committed_ values on each node. In contrast, the rest of the cluster state can only be changed at _accept_ time. In the past we would copy these fields over from the master on every publication, but since #90101 we don't copy anything at all if the `Metadata` is unchanged on the master. However, the master computes the diff against the last _committed_ state whereas the receiving nodes apply the diff to the last _accepted_ state, and this means if the master sends a no-op `Metadata` diff then the receiving node will revert its last-committed values to the ones included in the state it last accepted. With this commit we include the last-committed values alongside the cluster state diff so that they are always copied properly. Closes #90158 Backport of #92259 to 8.6
Adjusts the BWC version for the wire protocol and re-enables the BWC tests.
Extracted from elastic#92259
The cluster coordination consistency layer relies on a couple of fields within
Metadata
which record the last committed values on each node. In contrast, the rest of the cluster state can only be changed at accept time.In the past we would copy these fields over from the master on every publication, but since #90101 we don't copy anything at all if the
Metadata
is unchanged on the master. However, the master computes the diff against the last committed state whereas the receiving nodes apply the diff to the last accepted state, and this means if the master sends a no-opMetadata
diff then the receiving node will revert its last-committed values to the ones included in the state it last accepted.With this commit we include the last-committed values alongside the cluster state diff so that they are always copied properly.
Closes #90158