-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nodes not joining cluster #125
Comments
I think the problem is that, when the package gets installed, the service starts up right away, before Chef has put the config and cookie into place. The docs say that automatically joining the cluster only happens the first time it starts, and only if the config is set up before it starts. After that, it requires "rabbitmqctl reset" to clear out the state and try to join the cluster. |
I apologize, I was using an old version of the cookbook, let me try out the new one that includes the reset step. |
I have this issue as well on cookbook 3.2.2. Is there a way to delay the service start until after the configs are in place? Otherwise I'm already writing a wrapper cookbook b/c our firm needs a more custom rabbitmq.config, so I'm rewinding this cookbook via the wrapper. I can include a reset call in my wrapper if need be I suppose. |
Thanks for the reply, indeed there's something odd in the way the nodes join the cluster. When I spin up one wait for that one to come up, and then I launch another one both nodes see each other without any issue. I didn't do any "rabbitmqctl reset", I guess the issue is just timing.. But I'm eager to see your results with the reset step on it. Cheers |
In my case, after I updated the cookbook to the version that includes the node reset, I discovered that some of my nodes were running different versions, which prevented the automatic clustering after the node reset. I had installed some of them from the Ubuntu repo, and the later one from the RabbitMQ repo. |
The docs say that automatically joining the cluster only happens the first time it starts, and only if the config is set up before it starts. After that, it requires "rabbitmqctl reset" to clear out the state and try to join the cluster. You have to add some logic for each node to join the cluster, with rabbitmqctl join_cluster. I have done something like this in my own cookbook, which wrapps this one. Please let me know how I can provide that piece? I don't want to do a pull request, since it's not an elegant solution and I would love some feedback. |
To resolve this issue, I wrote cookbook that doing rabbitmqctl join_cluster. |
@sunggun-yu any chance we can roll this up into the main cookbook? it seems odd to call a wrapper cookbook to do something the main one should take care of. |
+1 |
+1 |
@sunggun-yu any thoughts? |
I can work on integrating this @jjasghar |
@jjasghar |
working on now. also, I'm changing some concepts for rabbitmq cookbook. i'll report (or document) changes later. |
@jjasghar , @cmluciano
|
Sorry for making confused on my previous comment.
code is committed in https://github.com/sunggun-yu/rabbitmq/tree/feature/cluster Thank you. |
What is the targeted behaviour when the first node is not working (outage or maintenance) ? On my wrapper I did the following resource : rabbitmq_cluster "rabbit@#{node['rabbitmq']['cluster_disk_nodes'][0].split('@')[1]}" do
node_type 'slave'
cluster_node_type 'disc'
action :join
end There's no |
I've just tested this and it worked perfectly :
["rabbit@rabbit2"] So we can say that "rabbit2" should be the "master" at this point ...
["rabbit@rabbit2", "rabbit@rabbit3"]
Rabbit2 is still the RMQ master and my array is now : ["rabbit@rabbit1", "rabbit@rabbit2", "rabbit@rabbit3"] Conclusion: Edit : the array changed after adding node1, so next time rabbit2 & 3 will converge they will trigger some resources and maybe restart / reload ... I know :) |
Last post : 3bis) reconverge 2 & 3 nodes
It fails :'( because rabbit1 is the first node of the array and is currently down. Edit : ATM I think i'll test the RMQ port of each node of my array to determine which one can be joined, before i'll try to join the cluster |
The solution would probably be to pass an array of cluster nodes to the LWRP and let him choose on which node it will make its join operation (LWRP would test & validate the node)? This is by far the smartest solution IMO. @sunggun-yu : What do you think about that strategy ? |
Thank you for your testing and details. I'll test your scenario and provide current scenario that I covered. also, my opinion as well. |
using array of node list is good idea. however, I think there is some pros and cons. also, I can't agree with choosing one of the node in the list when first node is not responding. ProsWe can use first node concept instead of master/slave. also, we can reduce the attribute in cluster cookbook. the array of cluster nodes clearly says which node is master and cluster name as well. ConsWe need to keep migrating the cluster_nodes list. for example, when we want to add rabbit4, rabbit[1-3] should update cluster_nodes value. we can use data_bag for this. but we cannot force to use. Reason why I don't agree with choosing one of the node in the list.It may occurs unexpected behavior on cluster. Scenario 1Scenario 2Scenario 3Appendix : current logic of joining cluster |
I would like to go with |
@jjasghar can we redefine the |
@sunggun-yu : After reading these posts, I thought that it might be better to store independently the cluster name (static) and cluster nodes array (which moves with time). So the cluster name would be initialized with the very first cluster node and then will never change. Maybe it's a too "stupid simple" approach but I like it :-) edit : I'm testing it |
OPTION 1 : Joining cluster with rabbitmqctl join We start with a 2 node cluster :
root@rabbit1:~# rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit1 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]}]},
{running_nodes,[rabbit@rabbit2,rabbit@rabbit1]},
{cluster_name,<<"rabbit@rabbit2.labs.acme.com">>},
{partitions,[]}] root@rabbit2:~# rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit2 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]}]},
{running_nodes,[rabbit@rabbit1,rabbit@rabbit2]},
{cluster_name,<<"rabbit@rabbit2.labs.acme.com">>},
{partitions,[]}] Let's make rabbit4 join the party :
root@rabbit4:~# dpkg -i rabbitmq-server_3.4.4-1_all.deb
[...]
root@rabbit4:~# rabbitmqctl stop_app
Stopping node rabbit@rabbit4 ...
root@rabbit4:~# echo "<my_cookie>" > /var/lib/rabbitmq/.erlang.cookie
root@rabbit4:~# rabbitmqctl join_cluster rabbit@rabbit1
Clustering node rabbit@rabbit4 with rabbit@rabbit1 ...
root@rabbit4:~# rabbitmqctl start_app
Starting node rabbit@rabbit4 ... root@rabbit1:~# rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit1 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit4]}]},
{running_nodes,[rabbit@rabbit4,rabbit@rabbit2,rabbit@rabbit1]},
{cluster_name,<<"rabbit@rabbit2.labs.acme.com">>},
{partitions,[]}]
root@rabbit2:~# rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit2 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit4]}]},
{running_nodes,[rabbit@rabbit4,rabbit@rabbit1,rabbit@rabbit2]},
{cluster_name,<<"rabbit@rabbit2.labs.acme.com">>},
{partitions,[]}]
root@rabbit4:~# rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit4 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit4]}]},
{running_nodes,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit4]},
{cluster_name,<<"rabbit@rabbit2.labs.acme.com">>},
{partitions,[]}] Conclusion : We don't care about cluster name. We only need to point to single running node of the cluster. |
OPTION 2 : Joining cluster with RabbitMQ autocluster feature
We start (again) with a 2 nodes cluster :
root@rabbit1:~# rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit1 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]}]},
{running_nodes,[rabbit@rabbit2,rabbit@rabbit1]},
{cluster_name,<<"rabbit@rabbit2.labs.acme.com">>},
{partitions,[]}] root@rabbit2:~# rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit2 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]}]},
{running_nodes,[rabbit@rabbit1,rabbit@rabbit2]},
{cluster_name,<<"rabbit@rabbit2.labs.acme.com">>},
{partitions,[]}] Let's make rabbit4 join the party :
root@rabbit4:~# dpkg -i rabbitmq-server_3.4.4-1_all.deb
[...]
root@rabbit4:~# rabbitmqctl stop_app
Stopping node rabbit@rabbit4 ...
root@rabbit4:~# echo "<my_cookie>" > /var/lib/rabbitmq/.erlang.cookie
root@rabbit4:~# rabbitmqctl reset
Resetting node rabbit@rabbit4 ... Edit /etc/rabbitmq/rabbitmq.config file :
Restart rabbit4 :
After that : root@rabbit1:~# rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit1 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit4]}]},
{running_nodes,[rabbit@rabbit4,rabbit@rabbit2,rabbit@rabbit1]},
{cluster_name,<<"rabbit@rabbit2.labs.acme.com">>},
{partitions,[]}]
root@rabbit2:~# rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit2 ...
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit4]}]},
{running_nodes,[rabbit@rabbit4,rabbit@rabbit1,rabbit@rabbit2]},
{cluster_name,<<"rabbit@rabbit2.labs.acme.com">>},
{partitions,[]}] Conclusion : We can totally get rid of all these considerations and let RMQ do all the work for us at the very first start of RMQ service in a new node. And (again) cluster name is absolutely -not- important. Edit : RabbitMQ4 starting log :
|
There are no master nodes in RabbitMQ clusters. Individual queues have masters (originally the node to which the client declaring it was connected). When a cluster restarts the last node to shut down is "special" in that it should be started first. Otherwise, all nodes are equal. You can cluster with any existing cluster node as long as your node has the same Erlang cookie. Until the new declarative clustering plugin is released to the public (no promises or dates), it makes sense for this cookbook to list cluster nodes in the RabbitMQ config (option 2 in the comment above) and delay RabbitMQ service [re-]start until the cookie is in place. If there's anything the RabbitMQ team can do to make this easier in our packages, let us know on rabbitmq-users. |
@michaelklishin : thanks for this comment. The question is for us "When and how do we trigger a reset / auto-join for a node ?". I vote for :
|
You don't want to reset nodes that are already cluster members in a cookbook. Newly added nodes don't need a reset (there is nothing to reset on them) until their first start (in which a new database, which assumes no clustering, is initialised). So, resetting after the package was installed and the service was started sounds reasonable to me. |
By the way, docs on auto-clustering. The more I think about this, the more I am convinced that the right thing to do is
In that case newly added nodes should join the cluster fine as long as one of the nodes listed in FTR, Iin the alternative solution we have at Pivotal, nodes wait for seed node to become available, which makes things a lot easier to automate. |
Thank you for the replies guys.
also, @BarthV I'm negative to use cluster_name. people may confusing with this. in actually, I confused :-) |
Let me know if you need some help to write recipe / helpers / LWRP or even to test your work. |
@sunggun-yu the provided attributes seem to have everything you need to use auto-clustering. Cluster name is orthogonal. It is certainly helpful for those who use federation and shovel plugins, or manage multiple clusters. |
@BarthV Thank you!!! 👍 @jjasghar I tried some different approaches and I almost decided go with in below.
I've tried this way as well. I believe this is lighter and more efficient way in LWRP perspective.
|
also, I just added the feature for set_cluster_name in manual clustering. requirements :
it was easy to implement since we have cluster_nodes attribute and select first node for join action. |
latest changes are committed. : https://github.com/sunggun-yu/rabbitmq/tree/feature/cluster also, you can test with test Vagrant cluster FYI : moving on to auto clustering. |
I added set_cluster_name action. it is merged in join action. and it will be executed when node is first node. however, it is more clear to have separate action. in actually, any of node in the cluster can set the cluster name in the RMQ |
Vagrant test box for auto clustering : https://github.com/sunggun-yu/vagrant-chef-rabbitmq-cluster/tree/test/auto_clustering Vagrant test box for manual clustering : https://github.com/sunggun-yu/vagrant-chef-rabbitmq-cluster/tree/test/manual_clustering |
This should be resolved via #238, if not please reopen. |
Hey there,
I'm using this chef recipe with Amazon opsWorks to create a cluster there. Everything seems to work perfect except for the fact that once the nodes are provisioned and all the configuration is place none of the nodes join each other as a cluster.
I have to go on each node and join them manually using
rabbitmqclt
. The configuration generated by the recipe seems correct. But still I need to do the manual job of join them together.e.g.
What is that that I'm doing wrong? is this a know issue?
The text was updated successfully, but these errors were encountered: