This repository holds legacy code related to The Marionette Collective project. That project has been deprecated by Puppet Inc and the code donated to the Choria Project.
Please review the Choria Project Website and specifically the MCollective Deprecation Notice for further information and details about the future of the MCollective project.
Often after just doing a change on servers you want to just be sure that they’re all going to pass a certain nagios check.
Say you’ve deployed some new code and restarted tomcat, there might be a time while you will experience high loads, you can very quickly determine when all your machines are back to normal using this. It can take nagios many minutes to check all your machines which is a totally unneeded delay in your deployment process.
If you put your nagios checks on your servers using the common Nagios NRPE then this agent can be used to quickly check it across your entire estate.
I wrote a blog post on using this plugin to aggregate checks for Nagios: Aggregating Nagios Checks With MCollective
This agent makes an assumption or two about how you set up NRPE, in your nrpe.cfg add the following:
include_dir=/etc/nagios/nrpe.d/
You should now set your commands up one file per check command, for example /etc/nagios/nrpe.d/check_load.cfg:
command[check_load]=/usr/lib64/nagios/plugins/check_load -w 1.5,1.5,1.5 -c 2,2,2
With this setup the agent will now be able to find your check_load command. I’ve added a Puppet define and template to help you create checks like this on GitHub
Follow the basic plugin install guide
You can set the directory where the NRPE cfg files live using plugin.nrpe.conf_dir
You can use the normal mco rpc script to run the agent:
% mco rpc nrpe runcommand command=check_load
Discovering hosts using the mongo method .... 27
* [ ============================================================> ] 27 / 27
dev1.example.com Request Aborted
UNKNOWN
Exit Code: 3
Output: No such command: check_load
Performance Data:
Summary of Exit Code:
OK : 26
UNKNOWN : 1
WARNING : 0
CRITICAL : 0
Finished processing 27 / 27 hosts in 380.57 ms
Or we provide a client specifically for this agent that is a bit more appropriate for the purpose:
The client by default only shows problems:
% mco nrpe -W /dev_server/ check_load
* [ ============================================================> ] 19 / 19
dev1.example.com status=UNKNOWN
No such command: check_load
Summary of Exit Code:
OK : 18
UNKNOWN : 1
WARNING : 0
CRITICAL : 0
Finished processing 19 / 19 hosts in 216.59 ms
To see all the statusses:
% mco nrpe -W /dev_server/ check_load -v
Discovering hosts using the mongo method .... 3
* [ ============================================================> ] 6 / 6
dev1.example.com status=UNKNOWN
No such command: check_load
dev9.example.com status=OK
OK - load average: 0.00, 0.00, 0.00
dev7.example.com status=OK
OK - load average: 0.00, 0.00, 0.00
Summary of Exit Code:
OK : 2
UNKNOWN : 1
CRITICAL : 0
WARNING : 0
---- check_load NRPE results ----
Nodes: 3 / 3
Pass / Fail: 2 / 1
Start Time: Fri Dec 14 11:21:58 +0000 2012
Discovery Time: 50.86ms
Agent Time: 212.90ms
Total Time: 263.76ms
The NRPE Agent ships with a data plugin that will enable you to filter discovery on the results of NRPE commands.
% mco rpc rpcutil ping -S "Nrpe('check_disk1').exitcode=0"
Discovering hosts using the mc method for 3 second(s) .... 1
* [ ============================================================> ] 1 / 1
dev2.example.com
Timestamp: 1355484245
Finished processing 1 / 1 hosts in 138.15 ms