Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[redis] new post 2.8 replication metrics #1350

Merged
merged 1 commit into from
Feb 24, 2015

Conversation

hkaj
Copy link
Member

@hkaj hkaj commented Feb 5, 2015

[redis] add new replication metrics for >= 2.8 instances

New metrics:

  • redis.replication.backlog_histlen
    shows the current replication backlog size in bytes used from the
    allocated repl-backlog-size in the config.
  • redis.replication.master_repl_offset
    this metric is available on masters, it shows the current offset of
    the replication process on the master in bytes
  • redis.replication.slave_repl_offset
    idem for the slaves.
  • redis.replication.delay
    on masters only, measures the difference between the last offset
    reported by the slave and the current replication offset, which
    indicates how far your slave is behind.
    This metric is tagged by slave_ip, slave_port and slave_id
    (internal ID in the master).

Also added on Travis a replication config between the two instances we
have to assert the presence of this new metric.
Bonus: now the redis instances that were launched by the CI are cleaned
properly by the cleanup task.

see issue #915

@@ -215,6 +218,16 @@ def _check_db(self, instance, custom_tags=None):
self.warning("{0} key not found in redis".format(key))
self.gauge('redis.key.length', 0, tags=key_tags)

# Save the replication delay for each slave
for key in info.keys():
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you shouldn't use .keys () as it would create a copy of the keys.

You should just do:

for key in info

@remh remh added this to the 5.3.0 milestone Feb 5, 2015
@remh
Copy link

remh commented Feb 5, 2015

Looks great besides the nitpicks and the comments i told you offline! Thanks a lot!
It looks like your tests are failing on travis, can you have a look ?

@hkaj hkaj force-pushed the haissam/redis-repl-metrics branch from 5a93dd3 to 52d68fa Compare February 5, 2015 20:55
@LeoCavaille LeoCavaille self-assigned this Feb 5, 2015
master_offset = info.get('master_repl_offset')
if slave_offset and master_offset and master_offset - slave_offset >= 0:
delay = master_offset - slave_offset
slave_tags = tags + ['slave_ip:%s' % info[key]['ip']] if 'ip' in info[key] else tags
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be more readable this way:

slave_tags = tags
for slave_tag in ('ip', 'port'):
    if slave_tag in info[key]:
        slave_tags.append('slave_{0}:{1}'.format(slave_tag, info[key][slave_tag]))

@LeoCavaille
Copy link
Member

@hkaj thanks for taking a stab at it.
A few remarks:

  • the metric should be named redis.replication.delay : lose the slave0 suffix because we can have this information by breaking down by slave ID.
  • can you tag this metric by slave_port too (usually people will run many synced redis instances on their master and slaves servers)
  • you installed replication on the redis CI instances but can you add some integration tests to see if this metric is actually here (for the proper versions of redis)! see tests/test_redis.py
  • a better commit message following our guidelines would be something like:
[redis] add new replication metrics for >= 2.8 instances

New metrics:
* `redis.replication.backlog_histlen`
  shows the current replication backlog size in bytes used from the
  allocated `repl-backlog-size` in the config.
* `redis.replication.master_repl_offset`  
  this metric is available on masters, it shows the current offset of
  the replication process on the master in bytes
* `redis.replication.slave_repl_offset`
  idem for the slaves.
* `redis.replication.delay`
  on masters only, measures the difference between the last offset
  reported by the slave and the current replication offset, which
  indicates how far your slave is behind.
  This metric is tagged by `slave_ip`, `slave_port` and `slave_id`
  (internal ID in the master).

Also added on Travis a replication config between the two instances we
have to assert the presence of this new metric.
Bonus: now the redis instances that were launched by the CI are cleaned
properly by the `cleanup` task.
  • added a few comments in the code too

Thanks 👍

@hkaj
Copy link
Member Author

hkaj commented Feb 18, 2015

@LeoCavaille Sorry for the delay. I think the last commits address your comments, please let me know if anything else is missing.

New metrics:

    redis.replication.backlog_histlen shows the current replication backlog size in bytes used from the allocated repl-backlog-size in the config.
    redis.replication.master_repl_offset
    this metric is available on masters, it shows the current offset of the replication process on the master in bytes
    redis.replication.slave_repl_offset idem for the slaves.
    redis.replication.delay on masters only, measures the difference between the last offset reported by the slave and the current replication offset, which indicates how far your slave is behind. This metric is tagged by slave_ip, slave_port and slave_id (internal ID in the master).

Also added on Travis a replication config between the two instances we
have to assert the presence of this new metric.
Bonus: now the redis instances that were launched by the CI are cleaned
properly by the cleanup task.

see issue #915
@hkaj hkaj force-pushed the haissam/redis-repl-metrics branch from d395c61 to 8cc0cb7 Compare February 23, 2015 19:20
def check(self, instance):
if (not "host" in instance or not "port" in instance) and not "unix_socket_path" in instance:
if ("host" not in instance or "port" not in instance) and "unix_socket_path" not in instance:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that's exactly what we want, see https://gist.github.com/LeoCavaille/801f5bb9288a94ca8d94
Especially the 4th example?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LeoCavaille I only moved the "not" keywords to the right place, that didn't change the behavior, which I think is what we want.

As long as there's a socket_path we're good, if there is no socket path, there must be a host and a port. We probably don't want to raise an exception if a user switch from a host:port couple to a socket and forget to remove the host from the config.

See https://github.com/DataDog/dd-agent/blob/haissam/redis-repl-metrics/checks.d/redisdb.py#L103-L107 for the use case.

@LeoCavaille
Copy link
Member

Except the last small nitpick on this config check, it looks good. Just waiting to hear back from you on that and then I'll merge it! Thanks

@LeoCavaille LeoCavaille changed the title Haissam/redis repl metrics [redis] new post 2.8 replication metrics Feb 24, 2015
@LeoCavaille
Copy link
Member

👍

LeoCavaille added a commit that referenced this pull request Feb 24, 2015
[redis] new post 2.8 replication metrics
@LeoCavaille LeoCavaille merged commit 92470f0 into master Feb 24, 2015
@LeoCavaille LeoCavaille deleted the haissam/redis-repl-metrics branch February 24, 2015 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants