Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Consul 0.5.0 and Atlas auto-join #135

Merged
merged 1 commit into from
Jun 16, 2015
Merged

Add support for Consul 0.5.0 and Atlas auto-join #135

merged 1 commit into from
Jun 16, 2015

Conversation

shanesveller
Copy link
Contributor

Commit adfa120 can probably be ignored from mainline. I took a glance at rebasing this off of develop branch,
but it seems like it's in the middle of a sizeable rewrite.

Please keep or discard as much as you the maintainers find useful, in light of #126 etc.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling adfa120 on shanesveller:consul-0.5.0 into db93cb6 on johnbellone:master.

atlas_cluster: <%= ENV.fetch('ATLAS_CLUSTER', 'example/cluster') %>
atlas_token: <%= ENV.fetch('ATLAS_TOKEN', 'NOT_REAL') %>
excludes:
- centos-7.0

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

@donaldguy
Copy link

Ideally this should support auto-bootstrapping the cluster with bootstrap_expect; but presently the logic in _service.rb countermands that.

I have prototyped that here: https://github.com/donaldguy/consul-cookbook/commit/7cf273aaffc220af6dc42cd74ddb76baba168015

It worked for me, starting from an up and associated with Atlas but unclustered set of servers, bringing them all down, deleteing the contents of the data dir (probably could just done /raft and/or /serf) and bringing them up one at a time.

@johnbellone
Copy link
Contributor

@shanesveller Please rebase!

@johnbellone johnbellone added this to the 0.9.0 release milestone Mar 14, 2015
@shanesveller
Copy link
Contributor Author

@johnbellone I've rebased my branch, but it looks like CentOS Kitchen tests are currently failing - maybe on master too. Will investigate more tomorrow.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 39dd110 on shanesveller:consul-0.5.0 into 0a8b2f2 on johnbellone:master.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 39dd110 on shanesveller:consul-0.5.0 into 0a8b2f2 on johnbellone:master.

@donaldguy
Copy link

Also please make sure something with bootstrap_expect support lands; I also really don't think you want the retry_join or start_join server lists if you are working with atlas_join

I have all that on the branch(es) here: https://github.com/donaldguy/consul-cookbook/commit/7cf273aaffc220af6dc42cd74ddb76baba168015

I haven't tried the full test suit but at least on Ubuntu 14.04 its working great in my production env.

@shanesveller
Copy link
Contributor Author

@johnbellone Rebased one more time - CentOS tests fail on master too, and my company doesn't use it so I'm afraid I can't really invest any time digging further.

@donaldguy I'm not clear on what your concerns are, can you elaborate here in the comments? My company tested this code as-is with an EC2 auto-scale group created from Packer-built AMIs and had no trouble scaling the cluster from 0 to 3 and up-and-down from there. The Atlas guide and Configure documentation don't seem to mention retry_join/start_join in conjunction with Atlas at all.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 0ef6ae9 on shanesveller:consul-0.5.0 into fca59ae on johnbellone:master.

@donaldguy
Copy link

So upon further review, this is a case of user error, but I suspect I won't be the only one.

the TL;DR is I didn't use the service_mode cluster cause I wasn't clear I was supposed to, because I didn't see the Getting Started section all the way at the bottom of the README (I saw and didn't want to use the chef-provisioning script under Usage, and I didn't think to look for more setup instructions after all the LWRP examples).

Maybe the fix is just putting it higher in the README or linking to it under Usage. It may also be making the cluster and server values synonymous, cause I am unconvinced server in its current incarnation is ever useful.

(EDIT: I guess if you are still using the legacy manual bootstrap then you'd still want server without bootstrap_expect ... but Hashicorp no longer recommends this and as it stands the recipe will silently drop a manually set bootstrap_expect with service_mode == 'server' - it should probably issue a warning at least. I opened a new issue to discuss this here #149 )


While I noticed cluster as an value checked for service_mode in _service.rb, because there was nothing in the Consul docs about a cluster mode, I assumed it was just for the chef-provisioning recipe and/or something legacy. I understood the one-only semantics of bootstrap, so server struck me as the correct value for that attribute

But what happens with this value and Atlas is confusing, when I first spun up, consul members showed my three servers and things looked good, but doing curl localhost:8500/v1/status/leader returned "" and the Atlas consul UI would not update (In an email exchange with Hashicorp folks, they say they'll probably add a no leader warning to explain that situation. Cause I was just seeing "Your infrastructure is healthy; 0 nodes".)

So the problems were/are three fold:

  1. I wasn't initially aware of the necessity bootstrap_expect parameter; this is on me.
  2. Using service_mode of server does not populate either the default nor a custom bootstrap_expect parameter through to the underlying config, causing no autoclustering to ever occur. I am not sure why this would ever be desirable.
  3. There is, however, written still a start_join or retry_join block in the config. Since I didn't use the chef-provisioning script or manually populate servers this is blank. This seems minimally aesthetically redundant when using the Atlas integration, and at worst having either a blank or outdated values here could prevent autoclustering from working in the event of a leader failure. I'd need to do more testing to determine which conclusively

so @shanesveller you can probably safely ignore my branch, but I will still suggest that, since Atlas seems to make the retry_join/start_join configs unnecessary that perhaps lines of the form

service_config[join_mode] = node['consul']['servers']

should be

service_config[join_mode] = node['consul']['servers'] unless node.consul.atlas_autojoin

@jjhuff
Copy link

jjhuff commented Apr 16, 2015

Hey, right now, atlas_cluster is required, but consul doesn't seem to require it.

It seems that auto-join and the SCADA/infrastructure stuff are two separate things (both require atlas_token). It'd be nice to be able to enable auto_join without SCADA.

@johnbellone johnbellone removed this from the 0.9.0 release milestone May 1, 2015
@johnbellone
Copy link
Contributor

#126 is about to land on master. Are these attributes exposed via configuration?

johnbellone added a commit that referenced this pull request Jun 16, 2015
Add support for Consul 0.5.0 and Atlas auto-join
@johnbellone johnbellone merged commit 10d582a into sous-chefs:master Jun 16, 2015
@lock
Copy link

lock bot commented Apr 25, 2020

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Apr 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants