-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nodes try to rejoin cluster when first listed node is down #347
Comments
The call to `joined_cluster?` was passing in the `to_join` node name instead of the current node name. This resulted in the nodes trying to rejoin whenever the `to_join` node was offline. Resolves rabbitmq#347
+1 |
This fix breaks new cluster builds. When a new node comes up with cluster, it's alone part of the cluster itself. The check will always return true, it will never allow a node to join another one. When I run cluster status on a new node I get this:
Let's say, the The fix is simple - if the node is down and removed from chef server, update the |
The call to `joined_cluster?` was passing in the `to_join` node name instead of the current node name. This resulted in the nodes trying to rejoin whenever the `to_join` node was offline. Resolves #347
Regarding the line https://github.com/jjasghar/rabbitmq/blob/master/providers/cluster.rb#L202, it looks as if the elsif statement is checking whether
var_node_name_to_join
is part ofcluster_status
, but I think it should be checkingvar_node_name
.var_node_name_to_join
is just set to the first node name in the array passed into the lwrp.I discovered that when powering down or stopping the first listed node in
node['rabbitmq']['clustering']['cluster_nodes']
, the other nodes worked fine until chef-client ran. They would then attempt to rejoin the cluster because the first node was no longer listed in the "running_nodes" output ofrabbitmqctl cluster_status
, and to rejoin they would try to connect again to the first node, which would fail as it was turned off. This would result in the whole cluster coming down.Let me know if you'd like more info, or a PR for this.
The text was updated successfully, but these errors were encountered: