-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cassandra_nodetool check #511
Merged
Merged
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
ac0045b
[cassandra_check] Add cassandra_check
zippolyte 49395f4
Rework tests
zippolyte 639923f
disable pylint checks
zippolyte e2c69b6
Some linting
zippolyte b87ab6a
Merge branch 'master' into hippo/cassandra_check
zippolyte edd33ac
Rework check. Use nodetool instead of python driver
zippolyte 8df8f16
Merge branch 'hippo/cassandra_check' of github.com:DataDog/integratio…
zippolyte a78758b
Merge branch master into hippo/cassandra_check
zippolyte 1c634a8
Add integration test
zippolyte cb28feb
Fix tests by allowing to specify a docker command for nodetool
zippolyte 5fe6264
Fix mock test
zippolyte 5110f7f
Address review comments
zippolyte db4aa03
More addressing
zippolyte 50fe2cd
Merge branch 'master' into hippo/cassandra_check
zippolyte 4ac71c7
Add service check section in readme
zippolyte 1031262
Merge branch 'hippo/cassandra_check' of github.com:DataDog/integratio…
zippolyte 9e53421
Continue loop if error on calling nodetool
zippolyte File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# CHANGELOG - Cassandra Nodetool Check | ||
|
||
0.1.0/ Unreleased | ||
================== | ||
|
||
### Changes | ||
|
||
* [FEATURE] adds cassandra_nodetool integration. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
# Agent Check: Cassandra Nodetool | ||
|
||
# Overview | ||
|
||
This check collects metrics for your Cassandra cluster that are not available through [jmx integration](https://github.com/DataDog/integrations-core/tree/master/cassandra). | ||
It uses the `nodetool` utility to collect them. | ||
|
||
# Installation | ||
|
||
The varnish check is packaged with the Agent, so simply [install the Agent](https://app.datadoghq.com/account/settings#agent) on your cassandra nodes. | ||
If you need the newest version of the check, install the `dd-check-cassandra_nodetool` package. | ||
|
||
# Configuration | ||
|
||
Create a file `cassandra_nodetool.yaml` in the Agent's `conf.d` directory: | ||
``` | ||
init_config: | ||
# command or path to nodetool (e.g. /usr/bin/nodetool or docker exec container nodetool) | ||
# can be overwritten on an instance | ||
# nodetool: /usr/bin/nodetool | ||
|
||
instances: | ||
|
||
# the list of keyspaces to monitor | ||
- keyspaces: [] | ||
|
||
# host that nodetool will connect to. | ||
# host: localhost | ||
|
||
# the port JMX is listening to for connections. | ||
# port: 7199 | ||
|
||
# a set of credentials to connect to the host. These are the credentials for the JMX server. | ||
# For the check to work, this user must have a read/write access so that nodetool can execute the `status` command | ||
# username: | ||
# password: | ||
|
||
# a list of additionnal tags to be sent with the metrics | ||
# tags: [] | ||
``` | ||
|
||
# Validation | ||
|
||
When you run `datadog-agent info` you should see something like the following: | ||
|
||
Checks | ||
====== | ||
|
||
cassandra_nodetool | ||
----------- | ||
- instance #0 [OK] | ||
- Collected 39 metrics, 0 events & 7 service checks | ||
|
||
# Compatibility | ||
|
||
The `cassandra_nodetool` check is compatible with all major platforms | ||
|
||
# Service Checks | ||
|
||
**cassandra.nodetool.node_up**: | ||
|
||
The agent sends this service check for each node of the monitored cluster. Returns CRITICAL if the node is down, otherwise OK. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,136 @@ | ||
# (C) Datadog, Inc. 2010-2016 | ||
# All rights reserved | ||
# Licensed under Simplified BSD License (see LICENSE) | ||
|
||
# stdlib | ||
import re | ||
import shlex | ||
|
||
# project | ||
from checks import AgentCheck | ||
from utils.subprocess_output import get_subprocess_output | ||
from collections import defaultdict | ||
|
||
EVENT_TYPE = SOURCE_TYPE_NAME = 'cassandra_nodetool' | ||
DEFAULT_HOST = 'localhost' | ||
DEFAULT_PORT = '7199' | ||
TO_BYTES = { | ||
'B': 1, | ||
'KB': 1e3, | ||
'MB': 1e6, | ||
'GB': 1e9, | ||
'TB': 1e12, | ||
} | ||
|
||
class CassandraNodetoolCheck(AgentCheck): | ||
|
||
datacenter_name_re = re.compile('^Datacenter: (.*)') | ||
node_status_re = re.compile('^(?P<status>[UD])[NLJM] +(?P<address>\d+\.\d+\.\d+\.\d+) +' | ||
'(?P<load>\d+\.\d*) (?P<load_unit>(K|M|G|T)?B) +\d+ +' | ||
'(?P<owns>(\d+\.\d+)|\?)%? +(?P<id>[a-fA-F0-9-]*) +(?P<rack>.*)') | ||
|
||
def __init__(self, name, init_config, agentConfig, instances=None): | ||
AgentCheck.__init__(self, name, init_config, agentConfig, instances) | ||
self.nodetool_cmd = init_config.get("nodetool", "/usr/bin/nodetool") | ||
|
||
def check(self, instance): | ||
# Allow to specify a complete command for nodetool such as `docker exec container nodetool` | ||
nodetool_cmd = shlex.split(instance.get("nodetool", self.nodetool_cmd)) | ||
host = instance.get("host", DEFAULT_HOST) | ||
port = instance.get("port", DEFAULT_PORT) | ||
keyspaces = instance.get("keyspaces", []) | ||
username = instance.get("username", "") | ||
password = instance.get("password", "") | ||
tags = instance.get("tags", []) | ||
|
||
# Flag to send service checks only once and not for every keyspace | ||
send_service_checks = True | ||
|
||
for keyspace in keyspaces: | ||
# Build the nodetool command | ||
cmd = nodetool_cmd + ['-h', host, '-p', port] | ||
if username and password: | ||
cmd += ['-u', username, '-pw', password] | ||
cmd += ['status', '--', keyspace] | ||
|
||
# Execute the command | ||
out, err, _ = get_subprocess_output(cmd, self.log, False) | ||
if err or 'Error:' in out: | ||
self.log.error('Error executing nodetool status: %s', err or out) | ||
continue | ||
nodes = self._process_nodetool_output(out) | ||
|
||
percent_up_by_dc = defaultdict(float) | ||
percent_total_by_dc = defaultdict(float) | ||
# Send the stats per node and compute the stats per datacenter | ||
for node in nodes: | ||
|
||
node_tags = ['node_address:%s' % node['address'], | ||
'node_id:%s' % node['id'], | ||
'datacenter:%s' % node['datacenter'], | ||
'rack:%s' % node['rack']] | ||
|
||
# nodetool prints `?` when it can't compute the value of `owns` for certain keyspaces (e.g. system) | ||
# don't send metric in this case | ||
if node['owns'] != '?': | ||
owns = float(node['owns']) | ||
if node['status'] == 'U': | ||
percent_up_by_dc[node['datacenter']] += owns | ||
percent_total_by_dc[node['datacenter']] += owns | ||
self.gauge('cassandra.nodetool.status.owns', owns, | ||
tags=tags + node_tags + ['keyspace:%s' % keyspace]) | ||
|
||
# Send service check only once for each node | ||
if send_service_checks: | ||
status = AgentCheck.OK if node['status'] == 'U' else AgentCheck.CRITICAL | ||
self.service_check('cassandra.nodetool.node_up', status, tags + node_tags) | ||
|
||
self.gauge('cassandra.nodetool.status.status', 1 if node['status'] == 'U' else 0, | ||
tags=tags + node_tags) | ||
self.gauge('cassandra.nodetool.status.load', float(node['load']) * TO_BYTES[node['load_unit']], | ||
tags=tags + node_tags) | ||
|
||
# All service checks have been sent, don't resend | ||
send_service_checks = False | ||
|
||
# Send the stats per datacenter | ||
for datacenter, percent_up in percent_up_by_dc.items(): | ||
self.gauge('cassandra.nodetool.status.replication_availability', percent_up, | ||
tags=tags + ['keyspace:%s' % keyspace, 'datacenter:%s' % datacenter]) | ||
for datacenter, percent_total in percent_total_by_dc.items(): | ||
self.gauge('cassandra.nodetool.status.replication_factor', int(round(percent_total / 100)), | ||
tags=tags + ['keyspace:%s' % keyspace, 'datacenter:%s' % datacenter]) | ||
|
||
def _process_nodetool_output(self, output): | ||
nodes = [] | ||
datacenter_name = "" | ||
for line in output.splitlines(): | ||
# Ouput of nodetool | ||
# Datacenter: dc1 | ||
# =============== | ||
# Status=Up/Down | ||
# |/ State=Normal/Leaving/Joining/Moving | ||
# -- Address Load Tokens Owns (effective) Host ID Rack | ||
# UN 172.21.0.3 184.8 KB 256 38.4% 7501ef03-eb63-4db0-95e6-20bfeb7cdd87 RAC1 | ||
# UN 172.21.0.4 223.34 KB 256 39.5% e521a2a4-39d3-4311-a195-667bf56450f4 RAC1 | ||
|
||
match = self.datacenter_name_re.search(line) | ||
if match: | ||
datacenter_name = match.group(1) | ||
continue | ||
|
||
match = self.node_status_re.search(line) | ||
if match: | ||
node = { | ||
'status': match.group('status'), | ||
'address': match.group('address'), | ||
'load': match.group('load'), | ||
'load_unit': match.group('load_unit'), | ||
'owns': match.group('owns'), | ||
'id': match.group('id'), | ||
'rack': match.group('rack'), | ||
'datacenter': datacenter_name | ||
} | ||
nodes.append(node) | ||
|
||
return nodes |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
require 'ci/common' | ||
|
||
def cassandra_nodetool_version | ||
ENV['FLAVOR_VERSION'] || '2.1.14' # '2.0.17' | ||
end | ||
|
||
container_name = 'dd-test-cassandra' | ||
container_name2 = 'dd-test-cassandra2' | ||
|
||
container_port = 7199 | ||
cassandra_jmx_options = "-Dcom.sun.management.jmxremote.port=#{container_port} | ||
-Dcom.sun.management.jmxremote.rmi.port=#{container_port} | ||
-Dcom.sun.management.jmxremote.ssl=false | ||
-Dcom.sun.management.jmxremote.authenticate=true | ||
-Dcom.sun.management.jmxremote.password.file=/etc/cassandra/jmxremote.password | ||
-Djava.rmi.server.hostname=localhost" | ||
|
||
namespace :ci do | ||
namespace :cassandra_nodetool do |flavor| | ||
task before_install: ['ci:common:before_install'] do | ||
sh %(docker kill #{container_name} 2>/dev/null || true) | ||
sh %(docker rm #{container_name} 2>/dev/null || true) | ||
sh %(docker kill #{container_name2} 2>/dev/null || true) | ||
sh %(docker rm #{container_name2} 2>/dev/null || true) | ||
sh %(rm -f #{__dir__}/jmxremote.password.tmp) | ||
end | ||
|
||
task :install do | ||
Rake::Task['ci:common:install'].invoke('cassandra_nodetool') | ||
sh %(docker create --expose #{container_port} \ | ||
-p #{container_port}:#{container_port} -e JMX_PORT=#{container_port} \ | ||
-e LOCAL_JMX=no -e JVM_EXTRA_OPTS="#{cassandra_jmx_options}" --name #{container_name} cassandra:#{cassandra_nodetool_version}) | ||
sh %(cp #{__dir__}/jmxremote.password #{__dir__}/jmxremote.password.tmp) | ||
sh %(chmod 400 #{__dir__}/jmxremote.password.tmp) | ||
sh %(docker cp #{__dir__}/jmxremote.password.tmp #{container_name}:/etc/cassandra/jmxremote.password) | ||
sh %(rm -f #{__dir__}/jmxremote.password.tmp) | ||
sh %(docker start #{container_name}) | ||
|
||
sh %(docker create --name #{container_name2} \ | ||
-e CASSANDRA_SEEDS="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' #{container_name})" \ | ||
cassandra:#{cassandra_nodetool_version}) | ||
sh %(docker start #{container_name2}) | ||
end | ||
|
||
task before_script: ['ci:common:before_script'] do | ||
# Wait.for container_port | ||
wait_on_docker_logs(container_name, 20, 'Listening for thrift clients', "Created default superuser role 'cassandra'") | ||
wait_on_docker_logs(container_name2, 40, 'Listening for thrift clients', 'Not starting RPC server as requested') | ||
sh %(docker exec #{container_name} cqlsh -e "CREATE KEYSPACE test WITH REPLICATION={'class':'SimpleStrategy', 'replication_factor':2}") | ||
end | ||
|
||
task script: ['ci:common:script'] do | ||
this_provides = [ | ||
'cassandra_nodetool' | ||
] | ||
Rake::Task['ci:common:run_tests'].invoke(this_provides) | ||
end | ||
|
||
task before_cache: ['ci:common:before_cache'] | ||
|
||
task cleanup: ['ci:common:cleanup'] do | ||
sh %(docker kill #{container_name} 2>/dev/null || true) | ||
sh %(docker rm #{container_name} 2>/dev/null || true) | ||
sh %(docker kill #{container_name2} 2>/dev/null || true) | ||
sh %(docker rm #{container_name2} 2>/dev/null || true) | ||
sh %(rm -f #{__dir__}/jmxremote.password.tmp) | ||
end | ||
|
||
task :execute do | ||
exception = nil | ||
begin | ||
%w(before_install install before_script).each do |u| | ||
Rake::Task["#{flavor.scope.path}:#{u}"].invoke | ||
end | ||
if !ENV['SKIP_TEST'] | ||
Rake::Task["#{flavor.scope.path}:script"].invoke | ||
else | ||
puts 'Skipping tests'.yellow | ||
end | ||
Rake::Task["#{flavor.scope.path}:before_cache"].invoke | ||
rescue => e | ||
exception = e | ||
puts "Failed task: #{e.class} #{e.message}".red | ||
end | ||
if ENV['SKIP_CLEANUP'] | ||
puts 'Skipping cleanup, disposable environments are great'.yellow | ||
else | ||
puts 'Cleaning up' | ||
Rake::Task["#{flavor.scope.path}:cleanup"].invoke | ||
end | ||
raise exception if exception | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
Datacenter: dc1 | ||
=============== | ||
Status=Up/Down | ||
|/ State=Normal/Leaving/Joining/Moving | ||
-- Address Load Tokens Owns (effective) Host ID Rack | ||
DN 172.21.0.6 178.43 KB 256 35.4% f86d2d7a-e5c7-4c46-b36e-df08c565171a rack1 | ||
UN 172.21.0.3 184.8 KB 256 31.0% 7501ef03-eb63-4db0-95e6-20bfeb7cdd87 RAC1 | ||
UN 172.21.0.2 182.05 KB 256 33.5% fa859fcc-5e76-44ce-9609-1f314bdf21c1 RAC1 | ||
Datacenter: dc2 | ||
=============== | ||
Status=Up/Down | ||
|/ State=Normal/Leaving/Joining/Moving | ||
-- Address Load Tokens Owns (effective) Host ID Rack | ||
UN 172.21.0.5 216.75 KB 256 100.0% 2250363b-7453-48f2-b6cb-ef79cad0612b RAC1 | ||
UN 172.21.0.4 223.34 KB 256 100.0% e521a2a4-39d3-4311-a195-667bf56450f4 RAC1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
controlRole QED |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
init_config: | ||
# command or path to nodetool (e.g. /usr/bin/nodetool or docker exec container nodetool) | ||
# can be overwritten on an instance | ||
# nodetool: /usr/bin/nodetool | ||
|
||
instances: | ||
|
||
# the list of keyspaces to monitor | ||
- keyspaces: [] | ||
|
||
# host that nodetool will connect to. | ||
# host: localhost | ||
|
||
# the port JMX is listening to for connections. | ||
# port: 7199 | ||
|
||
# a set of credentials to connect to the host. These are the credentials for the JMX server. | ||
# For the check to work, this user must have a read/write access so that nodetool can execute the `status` command | ||
# username: | ||
# password: | ||
|
||
# a list of additionnal tags to be sent with the metrics | ||
# tags: [] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
{ | ||
"maintainer": "help@datadoghq.com", | ||
"manifest_version": "0.1.0", | ||
"max_agent_version": "6.0.0", | ||
"min_agent_version": "5.6.3", | ||
"name": "cassandra_nodetool", | ||
"short_description": "monitor cassandra using the nodetool utility", | ||
"guid": "00e4a8bd-8ec2-4bb4-b725-6aaa91618d13", | ||
"support": "contrib", | ||
"supported_os": ["linux","mac_os","windows"], | ||
"version": "0.1.0" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
metric_name,metric_type,interval,unit_name,per_unit_name,description,orientation,integration,short_name | ||
cassandra.nodetool.status.replication_availability,gauge,,percent,,Percentage of data available per keyspace times replication factor,1,cassandra_nodetool,available data | ||
cassandra.nodetool.status.replication_factor,gauge,,,,Replication factor per keyspace,0,cassandra_nodetool,replication factor | ||
cassandra.nodetool.status.status,gauge,,,,Node status: up (1) or down (0),1,cassandra_nodetool,node status | ||
cassandra.nodetool.status.owns,gauge,,percent,,Percentage of the data owned by the node per datacenter times the replication factor,0,cassandra_nodetool,owns | ||
cassandra.nodetool.status.load,gauge,,byte,,Amount of file system data under the cassandra data directory without snapshot content,0,cassandra_nodetool,load |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# integration pip requirements |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you mention the
'cassandra.nodetool.node_up
service check?