-
Notifications
You must be signed in to change notification settings - Fork 725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve zen1 support with StatefulSets #1281
Comments
A suggestion from @ywelsch on Slack:
|
Following #1262, I think we need to:
|
Actually maybe that there is small improvement, if the current number For example: if you have 3 nodes and user want to move to 5 (or more) nodes then you can directly add the 2 new nodes because the new |
True. |
@DaveCTurner what's the recommended approach when dealing with a mixed v6/v7 cluster? Say I'm moving from 3 v6.x nodes to 5 v7.x nodes. One way of removing this problem is maybe to make sure we don't perform (up|down)scales and rolling upgrades at the same time. |
This will fail because the 6.x master node does not have the 7.x voting exclusion API |
@DaveCTurner is on vacation (for the next 2 weeks) so I'm answering here. Zen2 nodes always prefer to elect Zen1 nodes as master as long as there's a master-eligible Zen1 node in the cluster. Voting configurations only make sense once the cluster has been Zen2-bootstrapped. In a rolling-upgrade, this bootstrapping automatically happens once the Zen2 nodes notice that all Zen1 master-eligible node have left the cluster. Only after bootstrapping (which establishes the first voting configuration) can a Zen2 node become master. This means that in a rolling upgrade with mixed Zen1/Zen2 master-eligible nodes, as long as there is a Zen1 master, minimum_master_nodes should be adjusted, as there are no voting configurations (and therefore also no exclusions). If you're expanding a 3 node cluster to 5 nodes during a 6.x to 7.x upgrade with a grow-and-shrink, it should look as follows:
It's nonsensical because there are no voting configurations at that point.
It's non-sensical to do because once you have a Zen2 master, only voting configurations matter. |
Thanks @ywelsch!
|
yes, makes sense. |
@ywelsch one tricky bit raised in the PR for zen1 2->1 master special case is about availability of the master nodes. We can be in the following situation:
Even though we downscale, in theory, from 3 to 2 nodes as seen from K8S, we actually downscale effectively from 2 to 1 nodes as seen from ES. If we decide to update zen1 This is made a bit harder since ECK is working with out-of-date nodes state (can see one node unavailable that would, in real-time reality (😄), be available). We could also argue that the master to remove should be the unavailable one, but the user is allowed to express master nodes specification in different group of nodes (different config, availability zones, etc.) so may request explicitly the removal of one of the 2 running masters. Any thoughts? |
As you noticed, this type of down-scaling is completely unsafe, and the quickest way to get data loss. As @pebrc commented on the PR, you can't just take the currently available nodes into account to determine |
We could decide to always make sure the expected masters are running before updating I'm worried about cases where the user would be stuck with masters that will never go running. For example:
If we refuse to perform any downscale at this point, the user cannot remove those misconfigured StatefulSets? |
Could we detect some instances where such a node would be OK to remove. For example a bootlooping node (due to misconfiguration) would be detectable. I know it adds complexity but it might help us getting out of that situation. |
👍 this makes sense to me. We know how many we expect to be running and can check how many are actually members of the cluster. Personally I'm not too worried about handling a boot looping pod within the operator right now, that definitely seems like it can be an optimization for later. For now some manual work seems acceptable. We might need to increase the troubleshooting documentation as we run into more cases like that and then can think about automating them. |
When doing any zen2 API call (voting config exclusions), we checked if there is at least one v7+ master in the cluster in order to proceed. This is wrong because if the current master is a v6 node, the call will fail. In a cluster with mixed v6 -> v7 versions, the current master is supposed to be a v6 node. Zen2 APIs can be called once there is no more v6 node in the cluster, at which point the current master is a v7 node. For more context, see elastic#1281 (comment)
When doing any zen2 API call (voting config exclusions), we checked if there is at least one v7+ master in the cluster in order to proceed. This is wrong because if the current master is a v6 node, the call will fail. In a cluster with mixed v6 -> v7 versions, the current master is supposed to be a v6 node. Zen2 APIs can be called once there is no more v6 node in the cluster, at which point the current master is a v7 node. For more context, see elastic#1281 (comment)
* Only update zen2 settings if there is no v6 master node When doing any zen2 API call (voting config exclusions), we checked if there is at least one v7+ master in the cluster in order to proceed. This is wrong because if the current master is a v6 node, the call will fail. In a cluster with mixed v6 -> v7 versions, the current master is supposed to be a v6 node. Zen2 APIs can be called once there is no more v6 node in the cluster, at which point the current master is a v7 node. For more context, see #1281 (comment) * Rename functions to respect the IsCompatibleWith convention
I think it's safe to close this now. |
This is not quite true. It should be:
The normal flow for a rolling upgrade from 6.x to 7.x is as Yannick describes, but there is a failure case that isn't covered. If more than half of the master-eligible nodes are upgraded to 7.x and then the remaining 6.x master-eligible nodes are temporarily disconnected then the 7.x nodes will assume the 6.x master-eligible nodes have been shut down and will automatically bootstrap a 7.x cluster. When the 6.x nodes reconnect they will discover the 7.x cluster and will join it. You will now be in a state where the process described in this comment doesn't do the right thing, because it will be assuming this is still a 6.x cluster and manipulating As a more concrete example, imagine you are upgrading a 3-master-node cluster and get to the point of having two 7.x master nodes and one 6.x master node. If the 6.x master node is briefly disconnected then the 7.x master nodes will bootstrap a cluster and elect a master, and the voting configuration will be the elected master alone. If you now shut down the elected master without using a voting config exclusion then the cluster will fall apart, even if the other two master nodes are fine. I raise this because I'm not sure what else might happen mid-upgrade. If an upgrade gets stuck for some reason and you need to restart a 7.x node to fix it then you may need to handle the cluster being unavailable for a while. There's also a mechanism for preventing 6.x nodes from even joining a cluster that comprises entirely 7.x nodes, which might happen if the 6.x nodes are all temporarily disconnected. |
@DaveCTurner thanks for the details. |
If the cluster state has a nonempty voting configuration then it has definitely bootstrapped. However if you observe an empty voting configuration then we don't know: bootstrapping can technically happen at any time, so you may be looking at stale information. The best way to handle this is, I think, to make sure that you can still escape from this kind of intermediate state by pushing forwards with the upgrade despite failures. |
Closing this big issue in favor of keeping #1792 open to fix the corner case we discovered in the last few posts. Other than that, everything looks fine with zen1 so far. We'll open other issues if we discover new bugs/corner cases. |
How should we deal with zen1 in ECK reconciliation loops?
I have in mind the following, open for discussion:
Thoughts would be appreciated @ywelsch @DaveCTurner :)
The text was updated successfully, but these errors were encountered: