Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[autothrottle] path optimal throttle calculations #293

Merged
merged 43 commits into from
Mar 4, 2020

Conversation

jamiealquiza
Copy link
Collaborator

@jamiealquiza jamiealquiza commented Feb 27, 2020

Overview

Currently, autothrottle mimics the out of the box reassign-partitions throttle parameter in that a single value is assigned to both the inbound and outbound throttle rates for all brokers participating in a reassignment. The single rate value is determined by observing metrics data at each check interval and making a calculation based on the most loaded broker.

Internally, Kafka doesn't actually require that the inbound and outbound rates for a given broker are the same, nor does it require rates among brokers to be the same value. This PR creates a graph of the reassignment and observes broker utilization according to role; a "follower" in a replication is receiving data and a "leader" is sending data. A per-path rate is calculated according to the configured instance type capacity limit, an observation of current network utilization (accounting for any previously set throttles), and a ratio (separately configurable for inbound vs outbound flow) of free network capacity that's eligible for replication use. For each path, the rate is updated if it's more than n (configurable) distance from any previously set rates.

tl;dr this ensure that every replication path individually runs as fast as possible while adaptively maintaining capacity for consumers. Closes #81.

In Action

2020/02/27 22:28:12 Autothrottle Running
2020/02/27 22:28:13 Admin API: localhost:8080
2020/02/27 22:28:13 Topics with ongoing reassignments: [test0]
2020/02/27 22:28:13 Source brokers participating in replication: [1037 1039]
2020/02/27 22:28:13 Destination brokers participating in replication: [1033 1041]
2020/02/27 22:28:14 Replication throttle rate for broker 1037 [leader] (based on a 90% max free capacity utilization): 139.83MB/s
2020/02/27 22:28:14 Updated throttle on broker 1037 [leader]
2020/02/27 22:28:15 Replication throttle rate for broker 1039 [leader] (based on a 90% max free capacity utilization): 147.24MB/s
2020/02/27 22:28:15 Updated throttle on broker 1039 [leader]
2020/02/27 22:28:15 Replication throttle rate for broker 1041 [follower] (based on a 90% max free capacity utilization): 179.75MB/s
2020/02/27 22:28:15 Updated throttle on broker 1041 [follower]
2020/02/27 22:28:15 Replication throttle rate for broker 1033 [follower] (based on a 90% max free capacity utilization): 181.88MB/s
2020/02/27 22:28:15 Updated throttle on broker 1033 [follower]
2020/02/27 22:28:28 Topics with ongoing reassignments: [test0]
2020/02/27 22:28:28 Source brokers participating in replication: [1037 1039]
2020/02/27 22:28:28 Destination brokers participating in replication: [1033 1041]
2020/02/27 22:28:28 Replication throttle rate for broker 1039 [leader] (based on a 90% max free capacity utilization): 225.00MB/s
2020/02/27 22:28:28 Updated throttle on broker 1039 [leader]
2020/02/27 22:28:28 Replication throttle rate for broker 1041 [follower] (based on a 90% max free capacity utilization): 225.00MB/s
2020/02/27 22:28:28 Updated throttle on broker 1041 [follower]
2020/02/27 22:28:29 Replication throttle rate for broker 1033 [follower] (based on a 90% max free capacity utilization): 225.00MB/s
2020/02/27 22:28:29 Updated throttle on broker 1033 [follower]
2020/02/27 22:28:29 Replication throttle rate for broker 1037 [leader] (based on a 90% max free capacity utilization): 225.00MB/s
2020/02/27 22:28:29 Updated throttle on broker 1037 [leader]
2020/02/27 22:28:43 Topics with ongoing reassignments: [test0]
2020/02/27 22:28:43 Source brokers participating in replication: [1037]
2020/02/27 22:28:43 Destination brokers participating in replication: [1041]
2020/02/27 22:28:43 Replication throttle rate for broker 1037 [leader] (based on a 90% max free capacity utilization): 225.00MB/s
2020/02/27 22:28:43 Proposed throttle is within 0.00% of the previous throttle (below 10.00% threshold), skipping throttle update for broker 1037
2020/02/27 22:28:43 Replication throttle rate for broker 1041 [follower] (based on a 90% max free capacity utilization): 225.00MB/s
2020/02/27 22:28:43 Proposed throttle is within 0.00% of the previous throttle (below 10.00% threshold), skipping throttle update for broker 1041

@jamiealquiza jamiealquiza changed the title [TESTING] [autothrottle] per-path throttle calculations [TESTING] [autothrottle] path optimal throttle calculations Feb 28, 2020
v := flag.Bool("version", false, "version")
flag.StringVar(&Config.APIKey, "api-key", "", "Datadog API key")
flag.StringVar(&Config.AppKey, "app-key", "", "Datadog app key")
flag.StringVar(&Config.NetworkTXQuery, "net-tx-query", "avg:system.net.bytes_sent{service:kafka} by {host}", "Datadog query for broker outbound bandwidth by host")
flag.StringVar(&Config.NetworkRXQuery, "net-rx-query", "avg:system.net.bytes_rcvd{service:kafka} by {host}", "Datadog query for broker outbound bandwidth by host")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
flag.StringVar(&Config.NetworkRXQuery, "net-rx-query", "avg:system.net.bytes_rcvd{service:kafka} by {host}", "Datadog query for broker outbound bandwidth by host")
flag.StringVar(&Config.NetworkRXQuery, "net-rx-query", "avg:system.net.bytes_rcvd{service:kafka} by {host}", "Datadog query for broker inbound bandwidth by host")

// BrokerIDTag is the host tag name
// for Kafka broker IDs.
// NetworkRXQuery is a query string that should return the inbound
// network metrics by house for the reference Kafka brokers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/house/host

case c.Maximum <= 0 || c.Maximum > 100:
return nil, errors.New("maximum must be > 0 and < 100")
case c.SourceMaximum <= 0 || c.SourceMaximum > 100:
return nil, errors.New("source maximum must be > 0 and < 100")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "source maximum must be > 0 and <= 100" or in case we want < 100, set c.SourceMaximum >= 100. Same for dest.

if replicatingNow.isSubSetOf(replicatingPreviously) {
throttleMeta.DisableTopicUpdates()
} else {
throttleMeta.DisableTopicUpdates()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throttleMeta.DisableTopicUpdates()
throttleMeta.EnableTopicUpdates()

@jamiealquiza jamiealquiza changed the title [TESTING] [autothrottle] path optimal throttle calculations [autothrottle] path optimal throttle calculations Mar 4, 2020
Copy link
Contributor

@scanterog scanterog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamiealquiza jamiealquiza merged commit 896a115 into master Mar 4, 2020
@jamiealquiza jamiealquiza deleted the jamie/autothrottle branch March 4, 2020 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[autothrottle] variable throttle rates by path
2 participants