Skip to content

Commit

Permalink
Merge branch 'master' into quentin/windows-omnibus-5.0
Browse files Browse the repository at this point in the history
* master: (53 commits)
  [nginx] Update example config
  [service_discovery] Add a Zookeeper service discovery implementation.
  [aggregator] if sample rate is bad, fix it but still parse tags. (#3073)
  [yarn] whitelist authorized application_tags
  Alex poe/update jmx with refresh beans (#3068)
  [config] Fix `_is_affirmative` when passed argument is `None` (#3063)
  Send all configured tags with process checks. (#2976)
  fix flake8 errors
  [flare] ignore whitespace before proxy credentials
  [core] add a switch to disable profiling, but still use developer mode (#2898)
  [tests] allow tests to use the additional_checksd parameter (#3056)
  [service_discovery][jmx] trying to pick-up JMX changes with SD. (#3010)
  [install_script] Make `dd-agent` group of `datadog.conf` (#3036)
  [postgres] Allow disable postgresql.database_size (#3035)
  [core] Fixes IndexError for process lookup (#3043)
  remove warning message leaking password strings (#3053)
  trap psutil.NoSuchProcess exception (#3052)
  Fix grammar and casing in exception text (#3050)
  allow override of kubelet host with KUBERNETES_KUBELET_HOST env var
  [service discovery] properly handle config reload for removed containers
  ...
  • Loading branch information
degemer committed Dec 21, 2016
2 parents bc82757 + 2a98280 commit d0f4ca8
Show file tree
Hide file tree
Showing 70 changed files with 4,562 additions and 762 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,6 @@ embedded/*
dump.rdb
tests/core/fixtures/flare/dd*
.python-version
.ropeproject
.bundle
tags
3 changes: 3 additions & 0 deletions .rubocop.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,6 @@ Style/Documentation:
# Configuration parameters: Methods.
Style/SingleLineBlockParams:
Enabled: false

BlockLength:
Max: 110
40 changes: 40 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,33 @@
Changes
=======

# 5.10.1 / 11-21-2016
**Linux, Windows, Docker and Source Install**

### Details
https://github.com/DataDog/dd-agent/compare/5.10.0...5.10.1

### Updated Integrations

* RiakCS
* Mongo

### Changes
* [IMPROVEMENT] Core/Forwarder: stop flushing after 10s. See [#3018][].
* [IMPROVEMENT] Core: isolate system checks. See [#3011][].
* [IMPROVEMENT] RiakCS: support Riak CS 2.1+ stats format. See [#2920][]. (Thanks [@millerdev][])
* [IMPROVEMENT] Status: Silence requests exception. See [#3023][].
* [IMPROVEMENT] SpooledTemporaryFile for subprocess output. See [#3002][].

* [BUGFIX] Core: fix unintended subprocess_output empty output errors. See [#3024][].
* [BUGFIX] Core/Multiple Checks: Only set `psutil.PROCFS_PATH` once in the collector. See [#3013][].
* [BUGFIX] Core: use proxy for API key status check in info page. See [#3012][]. (Thanks [@2rs2ts][])
* [BUGFIX] Mongo: use db.current_op instead of manually querying. See [#3016][] (Thanks [@ebroder][])
* [BUGFIX] Mongo: use `currentOp` for monodb 3.1+. See [#3015][] (Thanks [@lattwood][])

* [DEPRECATE] Process: `procfs_path` is now deprecated, should be set in `datadog.conf`. See [#3013][].


# 5.10.0 / 11-09-2016
**Linux, Windows, Docker and Source Install**

Expand Down Expand Up @@ -3609,6 +3636,7 @@ https://github.com/DataDog/dd-agent/compare/2.2.9...2.2.10
[#2908]: https://github.com/DataDog/dd-agent/issues/2908
[#2910]: https://github.com/DataDog/dd-agent/issues/2910
[#2915]: https://github.com/DataDog/dd-agent/issues/2915
[#2920]: https://github.com/DataDog/dd-agent/issues/2920
[#2921]: https://github.com/DataDog/dd-agent/issues/2921
[#2926]: https://github.com/DataDog/dd-agent/issues/2926
[#2928]: https://github.com/DataDog/dd-agent/issues/2928
Expand Down Expand Up @@ -3644,8 +3672,18 @@ https://github.com/DataDog/dd-agent/compare/2.2.9...2.2.10
[#2984]: https://github.com/DataDog/dd-agent/issues/2984
[#2989]: https://github.com/DataDog/dd-agent/issues/2989
[#2991]: https://github.com/DataDog/dd-agent/issues/2991
[#3002]: https://github.com/DataDog/dd-agent/issues/3002
[#3006]: https://github.com/DataDog/dd-agent/issues/3006
[#3011]: https://github.com/DataDog/dd-agent/issues/3011
[#3012]: https://github.com/DataDog/dd-agent/issues/3012
[#3013]: https://github.com/DataDog/dd-agent/issues/3013
[#3015]: https://github.com/DataDog/dd-agent/issues/3015
[#3016]: https://github.com/DataDog/dd-agent/issues/3016
[#3018]: https://github.com/DataDog/dd-agent/issues/3018
[#3023]: https://github.com/DataDog/dd-agent/issues/3023
[#3024]: https://github.com/DataDog/dd-agent/issues/3024
[#3399]: https://github.com/DataDog/dd-agent/issues/3399
[@2rs2ts]: https://github.com/2rs2ts
[@AirbornePorcine]: https://github.com/AirbornePorcine
[@AntoCard]: https://github.com/AntoCard
[@CaptTofu]: https://github.com/CaptTofu
Expand Down Expand Up @@ -3736,6 +3774,7 @@ https://github.com/DataDog/dd-agent/compare/2.2.9...2.2.10
[@jslatts]: https://github.com/jslatts
[@jzoldak]: https://github.com/jzoldak
[@kzw]: https://github.com/kzw
[@lattwood]: https://github.com/lattwood
[@leifwalsh]: https://github.com/leifwalsh
[@leucos]: https://github.com/leucos
[@loris]: https://github.com/loris
Expand All @@ -3751,6 +3790,7 @@ https://github.com/DataDog/dd-agent/compare/2.2.9...2.2.10
[@micktwomey]: https://github.com/micktwomey
[@mike-lerch]: https://github.com/mike-lerch
[@mikekap]: https://github.com/mikekap
[@millerdev]: https://github.com/millerdev
[@mms-gianni]: https://github.com/mms-gianni
[@mooney6023]: https://github.com/mooney6023
[@morskoyzmey]: https://github.com/morskoyzmey
Expand Down
106 changes: 103 additions & 3 deletions agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@
import signal
import sys
import time
import supervisor.xmlrpc
import xmlrpclib
from copy import copy

# For pickle & PID files, see issue 293
Expand All @@ -29,13 +31,19 @@
from checks.collector import Collector
from config import (
get_config,
get_jmx_pipe_path,
get_parsed_args,
get_system_stats,
load_check_directory,
load_check
load_check,
generate_jmx_configs,
_is_affirmative,
SD_PIPE_NAME

)
from daemon import AgentSupervisor, Daemon
from emitter import http_emitter
from utils.platform import Platform

# utils
from utils.cloud_metadata import EC2
Expand All @@ -51,11 +59,18 @@
from utils.watchdog import new_watchdog

# Constants
from jmxfetch import JMX_CHECKS
PID_NAME = "dd-agent"
PID_DIR = None
WATCHDOG_MULTIPLIER = 10
RESTART_INTERVAL = 4 * 24 * 60 * 60 # Defaults to 4 days

JMX_SUPERVISOR_ENTRY = 'datadog-agent:jmxfetch'
JMX_GRACE_SECS = 2
SERVICE_DISCOVERY_PREFIX = 'SD-'
SD_CONFIG_SEP = "#### SERVICE-DISCOVERY ####\n"

DEFAULT_SUPERVISOR_SOCKET = '/opt/datadog-agent/run/datadog-supervisor.sock'
DEFAULT_COLLECTOR_PROFILE_INTERVAL = 20

# Globals
Expand All @@ -80,6 +95,9 @@ def __init__(self, pidfile, autorestart, start_event=True, in_developer_mode=Fal
# this flag can be set to True, False, or a list of checks (for partial reload)
self.reload_configs_flag = False
self.sd_backend = None
self.supervisor_proxy = None
self.sd_pipe = None


def _handle_sigterm(self, signum, frame):
"""Handles SIGTERM and SIGINT, which gracefully stops the agent."""
Expand All @@ -105,6 +123,7 @@ def reload_configs(self, checks_to_reload=set()):
Can also reload only an explicit set of checks."""
log.info("Attempting a configuration reload...")
hostname = get_hostname(self._agentConfig)
jmx_sd_configs = None

# if no check was given, reload them all
if not checks_to_reload:
Expand All @@ -114,13 +133,23 @@ def reload_configs(self, checks_to_reload=set()):
check.stop()

self._checksd = load_check_directory(self._agentConfig, hostname)
if self._jmx_service_discovery_enabled:
jmx_sd_configs = generate_jmx_configs(self._agentConfig, hostname)
else:
new_checksd = copy(self._checksd)

self.refresh_specific_checks(hostname, new_checksd, checks_to_reload)
jmx_checks = [check for check in checks_to_reload if check in JMX_CHECKS]
py_checks = set(checks_to_reload) - set(jmx_checks)
self.refresh_specific_checks(hostname, new_checksd, py_checks)
if self._jmx_service_discovery_enabled:
jmx_sd_configs = generate_jmx_configs(self._agentConfig, hostname, jmx_checks)

# once the reload is done, replace existing checks with the new ones
self._checksd = new_checksd

if jmx_sd_configs:
self._submit_jmx_service_discovery(jmx_sd_configs)

# Logging
num_checks = len(self._checksd['initialized_checks'])
if num_checks > 0:
Expand Down Expand Up @@ -218,9 +247,32 @@ def run(self, config=None):
if self._agentConfig.get('service_discovery'):
self.sd_backend = get_sd_backend(self._agentConfig)

if _is_affirmative(self._agentConfig.get('sd_jmx_enable')):
pipe_path = get_jmx_pipe_path()
if Platform.is_windows():
pipe_name = pipe_path.format(pipename=SD_PIPE_NAME)
else:
pipe_name = os.path.join(pipe_path, SD_PIPE_NAME)

if os.access(pipe_path, os.W_OK):
if not os.path.exists(pipe_name):
os.mkfifo(pipe_name)
self.sd_pipe = os.open(pipe_name, os.O_RDWR) # RW to avoid blocking (will only W)

# Initialize Supervisor proxy
self.supervisor_proxy = self._get_supervisor_socket(self._agentConfig)
else:
log.debug('Unable to create pipe in temporary directory. JMX service discovery disabled.')

# Load the checks.d checks
self._checksd = load_check_directory(self._agentConfig, hostname)

# Load JMX configs if available
if self._jmx_service_discovery_enabled:
jmx_sd_configs = generate_jmx_configs(self._agentConfig, hostname)
if jmx_sd_configs:
self._submit_jmx_service_discovery(jmx_sd_configs)

# Initialize the Collector
self.collector = Collector(self._agentConfig, emitters, systemStats, hostname)

Expand All @@ -241,13 +293,15 @@ def run(self, config=None):
self.restart_interval = int(self._agentConfig.get('restart_interval', RESTART_INTERVAL))
self.agent_start = time.time()

self.allow_profiling = self._agentConfig.get('allow_profiling', True)

profiled = False
collector_profiled_runs = 0

# Run the main loop.
while self.run_forever:
# Setup profiling if necessary
if self.in_developer_mode and not profiled:
if self.allow_profiling and self.in_developer_mode and not profiled:
try:
profiler = AgentProfiler()
profiler.enable_profiling()
Expand Down Expand Up @@ -344,6 +398,52 @@ def _set_agent_config_hostname(self, agentConfig):
log.info('Not running on EC2, using hostname to identify this server')
return agentConfig

def _get_supervisor_socket(self, agentConfig):
if Platform.is_windows():
return None

sockfile = agentConfig.get('supervisor_socket', DEFAULT_SUPERVISOR_SOCKET)
supervisor_proxy = xmlrpclib.ServerProxy(
'http://127.0.0.1',
transport=supervisor.xmlrpc.SupervisorTransport(
None, None, serverurl="unix://{socket}".format(socket=sockfile))
)

return supervisor_proxy

@property
def _jmx_service_discovery_enabled(self):
return self.sd_pipe is not None

def _submit_jmx_service_discovery(self, jmx_sd_configs):

if not jmx_sd_configs or not self.sd_pipe:
return

if self.supervisor_proxy is not None:
jmx_state = self.supervisor_proxy.supervisor.getProcessInfo(JMX_SUPERVISOR_ENTRY)
log.debug("Current JMX check state: %s", jmx_state['statename'])
# restart jmx if stopped
if jmx_state['statename'] in ['STOPPED', 'EXITED', 'FATAL'] and self._agentConfig.get('sd_jmx_enable'):
self.supervisor_proxy.supervisor.startProcess(JMX_SUPERVISOR_ENTRY)
time.sleep(JMX_GRACE_SECS)
else:
log.debug("Unable to automatically start jmxfetch on Windows via supervisor.")

buffer = ""
for name, yaml in jmx_sd_configs.iteritems():
try:
buffer += SD_CONFIG_SEP
buffer += "# {}\n".format(name)
buffer += yaml
except Exception as e:
log.exception("unable to submit YAML via RPC: %s", e)
else:
log.info("JMX SD Config via named pip %s successfully.", name)

if buffer:
os.write(self.sd_pipe, buffer)

def _should_restart(self):
if time.time() - self.agent_start > self.restart_interval:
return True
Expand Down
7 changes: 4 additions & 3 deletions aggregator.py
Original file line number Diff line number Diff line change
Expand Up @@ -492,13 +492,14 @@ def parse_metric_packet(self, packet):
# Parse the sample rate
if m[0] == '@':
sample_rate = float(m[1:])
assert 0 <= sample_rate <= 1
# in case it's in a bad state
sample_rate = 1 if sample_rate < 0 or sample_rate > 1 else sample_rate
elif m[0] == '#':
tags = tuple(sorted(m[1:].split(',')))
except (IndexError, AssertionError):
except IndexError:
log.warning(u'Incorrect metric metadata: metric_name:%s, metadata:%s',
name, u' '.join(value_and_metadata[2:]))
sample_rate = 1 # In case it's in a bad state

parsed_packets.append((name, value, metric_type, tags, sample_rate))

return parsed_packets
Expand Down
Loading

0 comments on commit d0f4ca8

Please sign in to comment.