[redis] new post 2.8 replication metrics #1350

hkaj · 2015-02-05T20:02:58Z

[redis] add new replication metrics for >= 2.8 instances

New metrics:

redis.replication.backlog_histlen
shows the current replication backlog size in bytes used from the
allocated repl-backlog-size in the config.
redis.replication.master_repl_offset
this metric is available on masters, it shows the current offset of
the replication process on the master in bytes
redis.replication.slave_repl_offset
idem for the slaves.
redis.replication.delay
on masters only, measures the difference between the last offset
reported by the slave and the current replication offset, which
indicates how far your slave is behind.
This metric is tagged by slave_ip, slave_port and slave_id
(internal ID in the master).

Also added on Travis a replication config between the two instances we
have to assert the presence of this new metric.
Bonus: now the redis instances that were launched by the CI are cleaned
properly by the cleanup task.

see issue #915

remh · 2015-02-05T20:12:43Z

checks.d/redisdb.py

@@ -215,6 +218,16 @@ def _check_db(self, instance, custom_tags=None):
                            self.warning("{0} key not found in redis".format(key))
                        self.gauge('redis.key.length', 0, tags=key_tags)

+        # Save the replication delay for each slave
+        for key in info.keys():


you shouldn't use .keys () as it would create a copy of the keys.

You should just do:

for key in info

remh · 2015-02-05T20:37:50Z

Looks great besides the nitpicks and the comments i told you offline! Thanks a lot!
It looks like your tests are failing on travis, can you have a look ?

LeoCavaille · 2015-02-05T22:06:02Z

checks.d/redisdb.py

+                master_offset = info.get('master_repl_offset')
+                if slave_offset and master_offset and master_offset - slave_offset >= 0:
+                    delay = master_offset - slave_offset
+                    slave_tags = tags + ['slave_ip:%s' % info[key]['ip']] if 'ip' in info[key] else tags


It might be more readable this way:

slave_tags = tags for slave_tag in ('ip', 'port'): if slave_tag in info[key]: slave_tags.append('slave_{0}:{1}'.format(slave_tag, info[key][slave_tag]))

LeoCavaille · 2015-02-05T22:08:38Z

@hkaj thanks for taking a stab at it.
A few remarks:

the metric should be named redis.replication.delay : lose the slave0 suffix because we can have this information by breaking down by slave ID.
can you tag this metric by slave_port too (usually people will run many synced redis instances on their master and slaves servers)
you installed replication on the redis CI instances but can you add some integration tests to see if this metric is actually here (for the proper versions of redis)! see tests/test_redis.py
a better commit message following our guidelines would be something like:

[redis] add new replication metrics for >= 2.8 instances

New metrics:
* `redis.replication.backlog_histlen`
  shows the current replication backlog size in bytes used from the
  allocated `repl-backlog-size` in the config.
* `redis.replication.master_repl_offset`  
  this metric is available on masters, it shows the current offset of
  the replication process on the master in bytes
* `redis.replication.slave_repl_offset`
  idem for the slaves.
* `redis.replication.delay`
  on masters only, measures the difference between the last offset
  reported by the slave and the current replication offset, which
  indicates how far your slave is behind.
  This metric is tagged by `slave_ip`, `slave_port` and `slave_id`
  (internal ID in the master).

Also added on Travis a replication config between the two instances we
have to assert the presence of this new metric.
Bonus: now the redis instances that were launched by the CI are cleaned
properly by the `cleanup` task.

added a few comments in the code too

Thanks 👍

hkaj · 2015-02-18T18:21:05Z

@LeoCavaille Sorry for the delay. I think the last commits address your comments, please let me know if anything else is missing.

New metrics: redis.replication.backlog_histlen shows the current replication backlog size in bytes used from the allocated repl-backlog-size in the config. redis.replication.master_repl_offset this metric is available on masters, it shows the current offset of the replication process on the master in bytes redis.replication.slave_repl_offset idem for the slaves. redis.replication.delay on masters only, measures the difference between the last offset reported by the slave and the current replication offset, which indicates how far your slave is behind. This metric is tagged by slave_ip, slave_port and slave_id (internal ID in the master). Also added on Travis a replication config between the two instances we have to assert the presence of this new metric. Bonus: now the redis instances that were launched by the CI are cleaned properly by the cleanup task. see issue #915

LeoCavaille · 2015-02-24T21:04:44Z

checks.d/redisdb.py

    def check(self, instance):
-        if (not "host" in instance or not "port" in instance) and not "unix_socket_path" in instance:
+        if ("host" not in instance or "port" not in instance) and "unix_socket_path" not in instance:


Not sure that's exactly what we want, see https://gist.github.com/LeoCavaille/801f5bb9288a94ca8d94
Especially the 4th example?

@LeoCavaille I only moved the "not" keywords to the right place, that didn't change the behavior, which I think is what we want.

As long as there's a socket_path we're good, if there is no socket path, there must be a host and a port. We probably don't want to raise an exception if a user switch from a host:port couple to a socket and forget to remove the host from the config.

See https://github.com/DataDog/dd-agent/blob/haissam/redis-repl-metrics/checks.d/redisdb.py#L103-L107 for the use case.

LeoCavaille · 2015-02-24T21:06:47Z

Except the last small nitpick on this config check, it looks good. Just waiting to hear back from you on that and then I'll merge it! Thanks

LeoCavaille · 2015-02-24T22:36:15Z

👍

[redis] new post 2.8 replication metrics

remh reviewed Feb 5, 2015
View reviewed changes

remh added this to the 5.3.0 milestone Feb 5, 2015

remh added feature 2 - Working labels Feb 5, 2015

hkaj force-pushed the haissam/redis-repl-metrics branch from 5a93dd3 to 52d68fa Compare February 5, 2015 20:55

LeoCavaille self-assigned this Feb 5, 2015

LeoCavaille reviewed Feb 5, 2015
View reviewed changes

LeoCavaille added checks and removed 2 - Working labels Feb 5, 2015

hkaj force-pushed the haissam/redis-repl-metrics branch from d395c61 to 8cc0cb7 Compare February 23, 2015 19:20

LeoCavaille reviewed Feb 24, 2015
View reviewed changes

LeoCavaille changed the title ~~Haissam/redis repl metrics~~ [redis] new post 2.8 replication metrics Feb 24, 2015

LeoCavaille added a commit that referenced this pull request Feb 24, 2015

Merge pull request #1350 from DataDog/haissam/redis-repl-metrics

92470f0

[redis] new post 2.8 replication metrics

LeoCavaille merged commit 92470f0 into master Feb 24, 2015

LeoCavaille deleted the haissam/redis-repl-metrics branch February 24, 2015 22:36

remh mentioned this pull request Feb 27, 2015

Redis: fetch additional replication metrics available in 2.8 #915

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[redis] new post 2.8 replication metrics #1350

[redis] new post 2.8 replication metrics #1350

hkaj commented Feb 5, 2015

remh Feb 5, 2015

remh commented Feb 5, 2015

LeoCavaille Feb 5, 2015

LeoCavaille commented Feb 5, 2015

hkaj commented Feb 18, 2015

LeoCavaille Feb 24, 2015

hkaj Feb 24, 2015

LeoCavaille commented Feb 24, 2015

LeoCavaille commented Feb 24, 2015

[redis] new post 2.8 replication metrics #1350

[redis] new post 2.8 replication metrics #1350

Conversation

hkaj commented Feb 5, 2015

remh Feb 5, 2015

Choose a reason for hiding this comment

remh commented Feb 5, 2015

LeoCavaille Feb 5, 2015

Choose a reason for hiding this comment

LeoCavaille commented Feb 5, 2015

hkaj commented Feb 18, 2015

LeoCavaille Feb 24, 2015

Choose a reason for hiding this comment

hkaj Feb 24, 2015

Choose a reason for hiding this comment

LeoCavaille commented Feb 24, 2015

LeoCavaille commented Feb 24, 2015