Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster join never happens for manual clustering #380

Merged
merged 2 commits into from
Sep 15, 2016

Conversation

akadoya
Copy link
Contributor

@akadoya akadoya commented Jun 11, 2016

When using manual clustering in this cookbook, rabbitmq(e.g. rabbit1-ubuntu-1404) starts as a member of a single node cluster named as its own hostname.
And the current join action will never happen by just skipping with the warning message below.

[rabbitmq_cluster] Node is already member of rabbit@rabbit1-ubuntu-1404. Joining cluster will be skipped.

This is because the current join action process doesn't consider the current cluster name and only checks if the node is a part of whatever cluster or not.

This PR is to check if the node is a part of the desired cluster defined through the attribute ( node['rabbitmq']['clustering']['cluster_name'] ) so that the join action will happen if the node is a part of the wrong/(in my case, default=hostname) cluster.

@satyabhan
Copy link

I ran into the same issue today while testing. Since the node is already part of its own cluster at the start joined_cluster always returns true and doesn't actually attempt to join the desired cluster. The proposed fix should work.

@satyabhan
Copy link

Added proof that the fix is working and creating the manual cluster correctly

image

@jjasghar
Copy link
Contributor

Can you add some unit tests around this? I know my future-self will probably break this if it's not protected via a test.

@akadoya
Copy link
Contributor Author

akadoya commented Sep 1, 2016

Hi @jjasghar
I've been thinking about the best way to create a unit test for this logic but actually it's really difficult to make it.
when you test this recipe by building a single node cluster, it'd not hit join action and it's what it's supposed to be. (because this node tries to join itself)

The issue of the current logic was because it checks only if the node is a part of some cluster or not and skipped join process - so the missing test point here is to verify if the node is a part of the cluster with the target peers as well as being a member of the cluster named after the attribute.
What I could think of to test this point is ;

  1. add a new kitchen configuration to test cluster recipe with multi-nodes <- this is how I noticed this issue by the way
  2. add an option attribute in rabbitmq_cluster provider to verify if node became a part of the desired cluster or not after join action and raise an error if not.

By both ways, you could notice in case clustering logic is not working as it's expected.
What would you think?

@richardnaik
Copy link

Is there any word on when this will be merged into master? I am still encountering this problem as of 4.9.0.

@michaelklishin
Copy link
Member

@jjasghar I can see how testing this can be a real pain. Can we consider merging this as is?

@jjasghar
Copy link
Contributor

Yeah i'm good with this. I'll merge now, and get a release pushed to the supermarket here in the next day or so.

@jjasghar jjasghar merged commit a60b7a8 into rabbitmq:master Sep 15, 2016
@richardnaik
Copy link

Thanks!

@mikeprigodich
Copy link

mikeprigodich commented May 18, 2017

There still seems to be a bug in this fix. I had to make the following change to get the second node to join the first node because the second node is already in a cluster with itself:

<   elsif joined_cluster?(var_node_name, var_cluster_status) && current_cluster_name(var_cluster_status) == var_cluster_name
<     Chef::Log.warn("[rabbitmq_cluster] Node is already member of your desired cluster #{current_cluster_name(var_cluster_status)}. Joining cluster will be skipped.")
---
>   elsif joined_cluster?(var_node_name, var_cluster_status) &&
>         joined_cluster?(var_node_name_to_join, var_cluster_status) &&
>         current_cluster_name(var_cluster_status) == var_cluster_name
>     Chef::Log.warn('[rabbitmq_cluster] Node is already member of your desired cluster ' \
>                    "#{current_cluster_name(var_cluster_status)}. Joining cluster will be skipped.")```

@scalp42
Copy link
Contributor

scalp42 commented May 26, 2017

@mikeprigodich could you open a PR by any chance?

@akadoya
Copy link
Contributor Author

akadoya commented May 26, 2017

@michaelklishin Maybe you're setting cluster name to the second node before joining the node to the first node?

jjasghar pushed a commit that referenced this pull request Jun 16, 2017
* adding cluster name check for cluster join action

* fix typo and rubocop offences
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants