-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only connect to new nodes on new cluster state #31547
Only connect to new nodes on new cluster state #31547
Conversation
Today, when a new cluster state is committed we attempt to connect to all of its nodes as part of the application process. This is the right thing to do with new nodes, and is a no-op on any already-connected nodes, but is questionable on known nodes from which we are currently disconnected: there is a risk that we are partitioned from these nodes so that any attempt to connect to them will hang until it times out. This can dramatically slow down the application of new cluster states which hinders the recovery of the cluster during certain kinds of partition. If nodes are disconnected from the master then it is likely that they are to be removed as part of a subsequent cluster state update, so there's no need to try and reconnect to them like this. Moreover there is no need to attempt to reconnect to disconnected nodes as part of the cluster state application process, because we periodically try and reconnect to any disconnected nodes, and handle their disconnectedness gracefully in the meantime. This commit alters this behaviour to avoid reconnecting to known nodes during cluster state application. Resolves elastic#29025.
Pinging @elastic/es-core-infra |
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a worthwhile improvement on the current situation, and it's a simple change, but I think it's worth revisiting in future since there's still room for improvement IMO: if the ConnectionChecker
happens to be running when a partition is detected then it can block the cluster state update thread from getting hold of the nodeLocks
that it needs for an extended period of time. Reducing the connection timeout (#29022) would help a bit here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -83,8 +83,7 @@ public void connectToNodes(DiscoveryNodes discoveryNodes) { | |||
for (final DiscoveryNode node : discoveryNodes) { | |||
final boolean connected; | |||
try (Releasable ignored = nodeLocks.acquire(node)) { | |||
nodes.putIfAbsent(node, 0); | |||
connected = transportService.nodeConnected(node); | |||
connected = (nodes.putIfAbsent(node, 0) != null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we avoid going on the management thread if we are truly connected? This will be relevant on the master where where the node is connected to during join validation. Also, I think the name connected
for the variable is a bit mislead now. Maybe flip it and call it shouldConnect
?
It will also good to have a comment explaining the rational for this check.
PS - as discussed, it will be good to explore using this component on non-elected-master nodes which will make my condition tweak obsolete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit on the fence for this change. Yes, it will help in the situation described, but I wonder what adversarial effects it will have after a network partition (We work around these adversarial effects in our tests by explicitly reconnecting the nodes). I wonder if we should force reconnectToKnownNodes
whenever the node applies a cluster state from a fresh master. Yes, this will slow down publishing a little, but only the first cluster state that's received from the new master, not subsequent ones.
@@ -207,7 +208,7 @@ public void testAckedIndexing() throws Exception { | |||
assertTrue("doc [" + id + "] indexed via node [" + ackedDocs.get(id) + "] not found", | |||
client(node).prepareGet("test", "type", id).setPreference("_local").get().isExists()); | |||
} | |||
} catch (AssertionError | NoShardAvailableActionException e) { | |||
} catch (AssertionError | NoShardAvailableActionException | NodeNotConnectedException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change still required?
I've decided to close this. It mostly helps, but with this approach there's still a chance of blocking a cluster state update on a connection to a blackhole because of the node locks, and I think it'd be better for future debugging/analysis if this wasn't based on luck. |
Today, when applying new cluster state we attempt to connect to all of its nodes as a blocking part of the application process. This is the right thing to do with new nodes, and is a no-op on any already-connected nodes, but is questionable on known nodes from which we are currently disconnected: there is a risk that we are partitioned from these nodes so that any attempt to connect to them will hang until it times out. This can dramatically slow down the application of new cluster states which hinders the recovery of the cluster during certain kinds of partition. If nodes are disconnected from the master then it is likely that they are to be removed as part of a subsequent cluster state update, so there's no need to try and reconnect to them like this. Moreover there is no need to attempt to reconnect to disconnected nodes as part of the cluster state application process, because we periodically try and reconnect to any disconnected nodes, and handle their disconnectedness reasonably gracefully in the meantime. This commit alters this behaviour to avoid reconnecting to known nodes during cluster state application. Resolves elastic#29025. Supersedes elastic#31547.
Today, when a new cluster state is committed we attempt to connect to all of
its nodes as part of the application process. This is the right thing to do
with new nodes, and is a no-op on any already-connected nodes, but is
questionable on known nodes from which we are currently disconnected: there is
a risk that we are partitioned from these nodes so that any attempt to connect
to them will hang until it times out. This can dramatically slow down the
application of new cluster states which hinders the recovery of the cluster
during certain kinds of partition.
If nodes are disconnected from the master then it is likely that they are to be
removed as part of a subsequent cluster state update, so there's no need to try
and reconnect to them like this. Moreover there is no need to attempt to
reconnect to disconnected nodes as part of the cluster state application
process, because we periodically try and reconnect to any disconnected nodes,
and handle their disconnectedness gracefully in the meantime.
This commit alters this behaviour to avoid reconnecting to known nodes during
cluster state application.
Resolves #29025.