Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add health checks for rabbitmq #1345

Merged
merged 10 commits into from
Aug 7, 2017
Merged

Conversation

jaym
Copy link
Contributor

@jaym jaym commented Jul 31, 2017

This PR adds health checks for rabbitmq for modules that make use of rabbitmq.

rabbitmq ->
chef_index_queue:ping(envy:get(chef_index, rabbitmq_vhost, binary));
_ ->
ping
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be pong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

@@ -218,6 +218,13 @@
retries 10
end

execute "#{rmq_ctl} set_permissions -p #{rabbitmq['vhost']} #{rabbitmq['management_user']} \".*\" \".*\" \".*\"" do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These calls are expensive, any way we could combine this with the one below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the docs, it does not look like i can pass a regex for vhost

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:( OK.

@jaym jaym force-pushed the jdm/SUSTAIN-672-improved-healthcheck branch from d6342cd to d5a9fc4 Compare July 31, 2017 14:50
@@ -227,6 +231,24 @@
{max_age, <%= @solr_http_max_age %>},
{max_connection_duration, <%= @solr_http_max_connection_duration %>},
{ibrowse_options, <%= @solr_ibrowse_options %>}
]},
{rabbitmq_index_management_service, [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit of a shame that in the common case (actions & index rabbitmq being the same) that we are going to have 2 pools of HTTP connections. However, since thy can be different, I think this is cleanest for now.

@jaym jaym force-pushed the jdm/SUSTAIN-672-improved-healthcheck branch 2 times, most recently from 7c703f5 to ba36e6c Compare August 1, 2017 00:26
@@ -28,7 +28,8 @@
%% API
-export([
server_for_vhost/1,
start_link/0
start_link/0,
maybe_rabbitmq_monitoring/0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove from exports

[
{"it's alive",
fun() ->
meck:new(oc_httpc),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can put this in the setup fun (the first fun after forearch) if you call it in each case.

@jaym jaym force-pushed the jdm/SUSTAIN-672-improved-healthcheck branch 3 times, most recently from 09f17b7 to db17116 Compare August 1, 2017 20:18
Copy link
Contributor

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good. Thanks for working on it! I've left a few minor comments. I agree with your TODO comment, but I'd also be fine merging this and doing the refactors in another PR to keep everything reviewable.

It might be nice if you squashed some of those commits and had a least 1 commit that describes the new feature in a few sentences.

We probably also want to put this in the RELEASE_NOTES?


-spec ping(binary()) -> pong | pang.
ping(VHost) ->
% TODO: chef_wm_rabbitmq_management needs to be moved to a shared app
Copy link
Contributor

@stevendanna stevendanna Aug 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] The TODO is fine but we are trying to use TODO(ssd) 2017-08-02: (i.e. TODO(initials) DATE:) so we know how long they've been there and who added them without crawling through git history.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, added a little more detail as well

]).

-define(SERVER, ?MODULE).
-define(POOLNAME, rabbitmq_management_service).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] Perhaps rabbitmq_actions_management_service to match rabbitmq_index_management_service

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -91,10 +92,11 @@ maybe_start_action(false, Workers) ->
Workers.


check_queue_at_capacity(Vhost, Queue) ->
check_actions_queue_at_capacity(Vhost, Queue) ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] Perhaps pass the pool name since we've already looked it up at the only call site of this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -302,3 +241,14 @@ parse_integer(Val) when is_list(Val) ->
{Int, _Rest} -> Int
end;
parse_integer(_) -> undefined.

-spec check_aliveness(atom(), string()) -> true | false.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] true | false can be boolean(). You can find the various built-in typespecs here: http://erlang.org/doc/reference_manual/typespec.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Queue,
Dropped),
chef_wm_rabbitmq_management:check_current_queue_state(
oc_chef_action_queue_config:get_rabbit_management_pool_name(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] Perhaps we can throw the poolname in the queue_monitor_state in init() so we don't have to look it up every time. This lookup is cheap so it isn't the end of the world to look it up, but since we already have the Vhost and Queue there it would be nice to be consistent (and some microseconds faster I suppose).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jaym
Copy link
Contributor Author

jaym commented Aug 2, 2017

Let me do some rebasing to tell the story i want

@jaym jaym force-pushed the jdm/SUSTAIN-672-improved-healthcheck branch 4 times, most recently from 84694ef to e647298 Compare August 2, 2017 16:55
Copy link
Contributor

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work Jay!

Jay Mundrawala and others added 10 commits August 4, 2017 08:12
Signed-off-by: Jay Mundrawala <jmundrawala@chef.io>
Signed-off-by: Jay Mundrawala <jmundrawala@chef.io>
… vips

The current chef server configuration allows people to point actions at
a rabbitmq that is not the same as the chef_index one. This change will
allow us to check the health of each one.

Signed-off-by: Jay Mundrawala <jmundrawala@chef.io>
Signed-off-by: Jay Mundrawala <jmundrawala@chef.io>
Signed-off-by: Jay Mundrawala <jmundrawala@chef.io>
Signed-off-by: Jay Mundrawala <jmundrawala@chef.io>
Signed-off-by: Jay Mundrawala <jmundrawala@chef.io>
Saves us on having to do a look up every call

Signed-off-by: Jay Mundrawala <jmundrawala@chef.io>
…vice

We have a management service for the internal queue as well now, so
this will specifically describe it without additional context needed

Signed-off-by: Jay Mundrawala <jmundrawala@chef.io>
This feature adds checks for RabbiMQ vhosts. These are checked when the
`_status` endpoint is called, and will use the RabbitMQ management plugin
to check the health of the vhosts we use.

Signed-off-by: Jay Mundrawala <jmundrawala@chef.io>
@jaym jaym force-pushed the jdm/SUSTAIN-672-improved-healthcheck branch from 1de5992 to 96b687d Compare August 4, 2017 15:13
@jaym jaym requested a review from a team August 4, 2017 15:13
@jaym jaym merged commit b8b147e into master Aug 7, 2017
@jaym jaym deleted the jdm/SUSTAIN-672-improved-healthcheck branch August 7, 2017 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants