Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add health checks for rabbitmq #1345

Merged
merged 10 commits into from
Aug 7, 2017
Merged
6 changes: 6 additions & 0 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,12 @@ For prior releases, see [PRIOR\_RELEASE\_NOTES.md](PRIOR_RELEASE_NOTES.md).

The preflight check that was in place to catch these situations has been removed.

* [RabbitMQ health check in status endpoint](https://github.com/chef/chef-server/pull/1345): Chef Server's
`_status` endpoint now checks the health of the analytics and internal RabbitMQ vhosts. For these checks
to work, the RabbitMQ management plugin must be installed. If it is not, the checks are not done. If
Chef Server is configured not to use Actions, a check will not be performed against the Actions vhost.
If an indexing queue is not used, the `chef_index` RabbitMQ vhost will not be checked.

## 12.15.8 (2017-06-20)

* [Stricter validation of non-functional user record fields](https://github.com/chef/chef-server/pull/1294),
Expand Down
6 changes: 5 additions & 1 deletion dev/Vagrantfile
Original file line number Diff line number Diff line change
Expand Up @@ -452,7 +452,11 @@ def host_timezone
end

def host_timezone_linux
File.read("/etc/timezone").chomp
if File.exists?("/etc/timezone")
File.read("/etc/timezone").chomp
else
"UTC"
end
end

def host_timezone_osx
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -218,6 +218,13 @@
retries 10
end

execute "#{rmq_ctl} set_permissions -p #{rabbitmq['vhost']} #{rabbitmq['management_user']} \".*\" \".*\" \".*\"" do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These calls are expensive, any way we could combine this with the one below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the docs, it does not look like i can pass a regex for vhost

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:( OK.

environment (rabbitmq_env)
user opc_username
not_if "#{rmq_ctl_chpst} list_user_permissions #{rabbitmq['management_user']}|grep #{rabbitmq['vhost']}", :environment => rabbitmq_env, :user => "root"
retries 10
end

execute "#{rmq_ctl} set_permissions -p / #{rabbitmq['management_user']} \".*\" \".*\" \".*\"" do
environment (rabbitmq_env)
user opc_username
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,10 @@
data_collector,
<% end %>
oc_chef_authz,
<% if node['private_chef']['dark_launch']['actions'] && node['private_chef']['rabbitmq']['management_enabled'] %>
oc_chef_action,
<% end %>
chef_index,
chef_sql,
chef_<%= node['private_chef']['opscode-erchef']['search_provider'] %>
]},
Expand All @@ -139,7 +143,7 @@
{user, "<%= @node['private_chef']['rabbitmq']['management_user'] %>"},
{port, <%= @node['private_chef']['rabbitmq']['management_port'] %>},
% rabbitmq management http connection pool
{rabbitmq_management_service, [
{rabbitmq_actions_management_service, [
<% if node['private_chef']['fips_enabled'] -%>
%% See note about Bookshelf
{root_url, "http://<%= @actions_vip %>:<%= @node['private_chef']['rabbitmq']['management_port'] %>/api"},
Expand Down Expand Up @@ -227,6 +231,25 @@
{max_age, <%= @solr_http_max_age %>},
{max_connection_duration, <%= @solr_http_max_connection_duration %>},
{ibrowse_options, <%= @solr_ibrowse_options %>}
]},
{rabbitmq_index_management_service, [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit of a shame that in the common case (actions & index rabbitmq being the same) that we are going to have 2 pools of HTTP connections. However, since thy can be different, I think this is cleanest for now.

{enabled, <%= @node['private_chef']['rabbitmq']['management_enabled'] %>},
{user, "<%= @node['private_chef']['rabbitmq']['management_user'] %>"},
<% if node['private_chef']['fips_enabled'] -%>
{root_url, "http://<%= node['private_chef']['rabbitmq']['vip'] %>:<%= @node['private_chef']['rabbitmq']['management_port'] %>/api"},
<% else -%>
{root_url, "https://<%= node['private_chef']['rabbitmq']['vip'] %>:<%= @node['private_chef']['rabbitmq']['management_port'] %>/api"},
<% end %>
{timeout, <%= @node['private_chef']['rabbitmq']['rabbit_mgmt_timeout'] %>},
{init_count, <%= @node['private_chef']['rabbitmq']['rabbit_mgmt_http_init_count'] %>},
{max_count, <%= @node['private_chef']['rabbitmq']['rabbit_mgmt_http_max_count'] %>},
{cull_interval, {<%= @node['private_chef']['rabbitmq']['rabbit_mgmt_http_cull_interval'] %>, sec}},
{max_age, {<%= @node['private_chef']['rabbitmq']['rabbit_mgmt_http_max_age'] %>, sec}},
{max_connection_duration, {<%= @node['private_chef']['rabbitmq']['rabbit_mgmt_http_max_connection_duration'] %>, sec}},

{ibrowse_options, [
<%= @node['private_chef']['rabbitmq']['rabbit_mgmt_ibrowse_options'] %>
]}
]}
]},

Expand Down
2 changes: 1 addition & 1 deletion src/oc_erchef/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ bundle:
@cd apps/chef_objects/priv/depselector_rb; bundle install --deployment --path .bundle


CHEFDK_GECODE_PATH:=/opt/chefdk/embedded/lib/ruby/gems/2.3.0/gems/dep-selector-libgecode-1.3.1/lib/dep-selector-libgecode/vendored-gecode
CHEFDK_GECODE_PATH:=/opt/chefdk/embedded/lib/ruby/gems/2.4.0/gems/dep-selector-libgecode-1.3.1/lib/dep-selector-libgecode/vendored-gecode
travis_env:
@echo export TRAVIS=1
@echo export USE_SYSTEM_GECODE=1
Expand Down
3 changes: 2 additions & 1 deletion src/oc_erchef/apps/chef_index/src/chef_index.app.src
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,8 @@
kernel,
stdlib,
gen_bunny,
ibrowse
ibrowse,
chef_secrets
]},
{env, []}
]}.
18 changes: 17 additions & 1 deletion src/oc_erchef/apps/chef_index/src/chef_index.erl
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,8 @@
delete/4,
add/5,
add_batch/1,
search_provider/0
search_provider/0,
ping/0
]).

-include("chef_solr.hrl").
Expand Down Expand Up @@ -187,3 +188,18 @@ send_to_solr(batch, Doc) ->
chef_index_batch:add_item(Doc);
send_to_solr(inline, Doc) ->
chef_index_expand:send_item(Doc).

ping() ->
case queue_mode() of
rabbitmq ->
Config = envy:get(chef_index, rabbitmq_index_management_service, [], any),
Enabled = proplists:get_value(enabled, Config),
case Enabled of
true ->
chef_index_queue:ping(envy:get(chef_index, rabbitmq_vhost, binary));
_ ->
pong
end;
_ ->
pong
end.
28 changes: 27 additions & 1 deletion src/oc_erchef/apps/chef_index/src/chef_index_queue.erl
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,9 @@
delete/4,
delete/5,
set/5,
set/6
set/6,
create_management_pool/3,
ping/1
]).

-ifdef(TEST).
Expand All @@ -36,6 +38,8 @@
-type solr_url() :: [byte()] | binary() | undefined.
-type vhost() :: binary().

-define(POOLNAME, rabbitmq_index_management_service).

%%%%
%% Public API
%%%%
Expand Down Expand Up @@ -103,10 +107,31 @@ package_for_delete(Type, ID, DatabaseName, SolrUrl) ->
InnerEnvelope = inner_envelope(Type, ID, DatabaseName, {[]}, SolrUrl),
{[{action, delete}, {payload, InnerEnvelope}]}.


create_management_pool(Username, Password, Config) ->
chef_wm_rabbitmq_management:create_pool(?POOLNAME, add_basic_auth(Username, Password, Config)).

-spec ping(binary()) -> pong | pang.
ping(VHost) ->
% TODO(jaym) 2017-08-02: chef_wm_rabbitmq_management should be moved to a shared app.
% The reason for this is because referencing chef_wm_rabbitmq_management from here
% creates a 2 way dependency between chef_index and oc_chef_wm.
case chef_wm_rabbitmq_management:check_aliveness(
?POOLNAME, binary_to_list(VHost)) of
true -> pong;
_ -> pang
end.

%%%%
%% Internal
%%%%

add_basic_auth(Username, Password, Config) ->
IbrowseOptions = proplists:get_value(ibrowse_options, Config),
Config1 = proplists:delete(ibrowse_options, Config),
IbrowseOptions1 = [{basic_auth, {Username, erlang:binary_to_list(Password)}} | IbrowseOptions],
[{ibrowse_options, IbrowseOptions1} | Config1].

-spec inner_envelope(chef_indexable_type(), uuid_binary(), chef_db_name(), ejson(), solr_url()) -> ejson().
inner_envelope(Type, ID, DatabaseName, Item, SolrUrl) ->
%% SAMPLE ENVELOPE:
Expand Down Expand Up @@ -157,3 +182,4 @@ object_id_to_i(UUID) ->
unix_time() ->
{MS, S, _US} = os:timestamp(),
(1000000 * MS) + S.

19 changes: 19 additions & 0 deletions src/oc_erchef/apps/chef_index/src/chef_index_sup.erl
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,28 @@ init([]) ->
error_logger:info_msg("Starting chef_index_sup.~n", []),
error_logger:info_msg("Creating HTTP pool for Solr.~n"),
chef_index_http:create_pool(),
maybe_rabbitmq_monitoring(),
Children = child_spec(),
{ok, {{one_for_one, 60, 10}, Children}}.

maybe_rabbitmq_monitoring() ->
case envy:get(chef_index, search_queue_mode, rabbitmq, envy:one_of([rabbitmq, batch, inline])) of
rabbitmq ->
Config = envy:get(chef_index, rabbitmq_index_management_service, [], any),
{LocalConfig, HttpConfig} = proplists:split(Config, [enabled, user]),
case LocalConfig of
[[{enabled, true}], [{user, Username}]] ->
{ok, Password} = chef_secrets:get(<<"rabbitmq">>, <<"management_password">>),
chef_index_queue:create_management_pool(Username, Password, HttpConfig);
_ ->
error_logger:info_msg("Rabbitmq monitoring is disabled. "
"chef_index will not check rabbitmq health.~n"),
ok
end;
_ ->
ok
end.

%% Return a spec for a bunnyc gen_server or the chef_index_batch gen_server based on the
%% search_queue_mode configuration.
%%
Expand Down
60 changes: 60 additions & 0 deletions src/oc_erchef/apps/chef_index/test/chef_index_tests.erl
Original file line number Diff line number Diff line change
Expand Up @@ -108,3 +108,63 @@ delete_assertion() ->
meck:expect(chef_index_expand, send_delete, fun(?EXPECTED_DELETE_DOC) -> ok end),
chef_index:delete(role, <<"a1">>, <<"db1">>, <<"abcd">>),
?assert(meck:validate(chef_index_expand)).

ping_test_() ->
{foreach,
fun() ->
application:set_env(chef_index, rabbitmq_vhost, <<"/testvhost">>),
application:set_env(chef_index, search_queue_mode, rabbitmq),
application:set_env(chef_index, rabbitmq_index_management_service, [{enabled, true}]),
meck:new(chef_wm_rabbitmq_management)
end,
fun(_) ->
meck:unload(chef_wm_rabbitmq_management)
end,
[{"When rabbitmq is not used, the check returns pong",
% This is not a requirement, there's just nothing else that
% is currently checked.
fun() ->
application:set_env(chef_index, search_queue_mode, batch),
meck:expect(chef_wm_rabbitmq_management, check_aliveness,
fun(_, "/testvhost") ->
throw(shouldnt_be_called)
end),
Status = chef_index:ping(),
?assertEqual(pong, Status)
end
},
{"When rabbitmq is enabled but management is not, the check returns pong",
% This is not a requirement, there's just nothing else that
% is currently checked.
fun() ->
application:set_env(chef_index, rabbitmq_index_management_service, [{enabled, false}]),
meck:expect(chef_wm_rabbitmq_management, check_aliveness,
fun(_, "/testvhost") ->
throw(shouldnt_be_called)
end),
Status = chef_index:ping(),
?assertEqual(pong, Status)
end
},
{"When rabbitmq and management are enabled, and rabbit is alive, returns pong",
fun() ->
meck:expect(chef_wm_rabbitmq_management, check_aliveness,
fun(_, "/testvhost") ->
true
end),
Status = chef_index:ping(),
?assertEqual(pong, Status)
end
},
{"When rabbitmq and management are enabled, and rabbit is not alive, returns pang",
fun() ->
meck:expect(chef_wm_rabbitmq_management, check_aliveness,
fun(_, "/testvhost") ->
false
end),
Status = chef_index:ping(),
?assertEqual(pang, Status)
end
}
]}.

Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
-define(ACTIONS_EXCHANGE, <<"actions">>).

-define(Q_SETTING(K, V),
chef_wm_rabbitmq_management:set_rabbit_queue_monitor_setting(K,V)).
oc_chef_action_queue_config:set_rabbit_queue_monitor_setting(K,V)).

init_per_suite(InitialConfig) ->
UseFakeRabbit =
Expand Down Expand Up @@ -134,7 +134,7 @@ basic_queue_monitor(Config) ->


% don't drop messages from now on
chef_wm_rabbitmq_management:set_rabbit_queue_monitor_setting(drop_on_full_capacity, false),
oc_chef_action_queue_config:set_rabbit_queue_monitor_setting(drop_on_full_capacity, false),

make_data_bag(?CLIENT_NAME, 13),
make_data_bag(?CLIENT_NAME, 14),
Expand Down Expand Up @@ -172,7 +172,7 @@ queue_full_dont_start(Config) ->
case ?config(use_fake_rabbit, Config) of
true ->
meck:expect(chef_wm_rabbitmq_management, sync_check_queue_at_capacity,
fun(_Vhost, _Queue) ->
fun(_PoolNameAtom, _Vhost, _Queue) ->
{MaxLength, MaxLength, true}
end);
false -> ok
Expand Down Expand Up @@ -243,7 +243,7 @@ default_rabbit_config(Config) ->
{port, 15672},
{password, BinPassword},
% rabbitmq management http connection pool
{rabbitmq_management_service,
{rabbitmq_actions_management_service,
[{root_url, "http://127.0.0.1:15672/api"},
{timeout, 30000},
{init_count, 25},
Expand Down Expand Up @@ -347,12 +347,12 @@ setup_rabbit_fake(Config) ->


meck:expect(chef_wm_rabbitmq_management, get_max_length,
fun(_Vhost) ->
fun(_PoolNameAtom, _Vhost) ->
?config(max_length, Config)
end),

meck:expect(chef_wm_rabbitmq_management, get_current_length,
fun(_Vhost, _Queue) ->
fun(_PoolNameAtom, _Vhost, _Queue) ->
fake_rabbit_current_length()
end),
ok.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@
-define(QUEUE, <<"alaska">>).
-define(GEN_SERVER_TIMEOUT_MILLIS, 5000).
-define(QUEUE_MONITOR_RUN_EVERY_MILLIS, 60000).
-define(QUEUE_MONITOR_SETTING(Key, Default), chef_wm_rabbitmq_management:get_rabbit_queue_monitor_setting(Key, Default)).
-define(QUEUE_MONITOR_SETTING(Key, Default), oc_chef_action_queue_config:get_rabbit_queue_monitor_setting(Key, Default)).

-record(queue_monitor_state, {
% name of the rabbitmq vhost to monitor, as a string
Expand Down Expand Up @@ -134,7 +134,10 @@
sync_response_process = undefined,

% total # of checks, even if unsuccessful
check_count = 0
check_count = 0,

% name of the http pool to use
pool_name = undefined
}).


Expand Down Expand Up @@ -198,9 +201,11 @@ init([Vhost, Queue, MaxLength, CurrentLength]) ->
% used to catch worker msgs
process_flag(trap_exit, true),
TRef = start_update_timer(),
PoolNameAtom = oc_chef_action_queue_config:get_rabbit_management_pool_name(),
{ok, #queue_monitor_state{timer=TRef,
queue_at_capacity = MaxLength == CurrentLength andalso MaxLength > 0,
worker_process = undefined,
pool_name = PoolNameAtom,
vhost_to_monitor = Vhost,
queue_to_monitor = Queue,
max_length = MaxLength,
Expand Down Expand Up @@ -300,15 +305,18 @@ handle_info({MaxLength, N, AtCap}, State) ->
handle_info(status_ping, #queue_monitor_state{
worker_process = undefined,
dropped_since_last_check = Dropped,
pool_name = PoolNameAtom,
vhost_to_monitor = Vhost,
queue_to_monitor = Queue} = State) ->
ParentPid = self(),
Pid = spawn_link(
fun () ->
Result =
chef_wm_rabbitmq_management:check_current_queue_state(Vhost,
Queue,
Dropped),
chef_wm_rabbitmq_management:check_current_queue_state(
PoolNameAtom,
Vhost,
Queue,
Dropped),
ParentPid ! Result
end),
{noreply, State#queue_monitor_state{worker_process=Pid}};
Expand Down
Loading