-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[autothrottle] path optimal throttle calculations #293
Conversation
2328e96
to
eac6935
Compare
da76cfd
to
ef0014b
Compare
cmd/autothrottle/main.go
Outdated
v := flag.Bool("version", false, "version") | ||
flag.StringVar(&Config.APIKey, "api-key", "", "Datadog API key") | ||
flag.StringVar(&Config.AppKey, "app-key", "", "Datadog app key") | ||
flag.StringVar(&Config.NetworkTXQuery, "net-tx-query", "avg:system.net.bytes_sent{service:kafka} by {host}", "Datadog query for broker outbound bandwidth by host") | ||
flag.StringVar(&Config.NetworkRXQuery, "net-rx-query", "avg:system.net.bytes_rcvd{service:kafka} by {host}", "Datadog query for broker outbound bandwidth by host") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flag.StringVar(&Config.NetworkRXQuery, "net-rx-query", "avg:system.net.bytes_rcvd{service:kafka} by {host}", "Datadog query for broker outbound bandwidth by host") | |
flag.StringVar(&Config.NetworkRXQuery, "net-rx-query", "avg:system.net.bytes_rcvd{service:kafka} by {host}", "Datadog query for broker inbound bandwidth by host") |
kafkametrics/datadog/datadog.go
Outdated
// BrokerIDTag is the host tag name | ||
// for Kafka broker IDs. | ||
// NetworkRXQuery is a query string that should return the inbound | ||
// network metrics by house for the reference Kafka brokers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/house/host
case c.Maximum <= 0 || c.Maximum > 100: | ||
return nil, errors.New("maximum must be > 0 and < 100") | ||
case c.SourceMaximum <= 0 || c.SourceMaximum > 100: | ||
return nil, errors.New("source maximum must be > 0 and < 100") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "source maximum must be > 0 and <= 100" or in case we want < 100, set c.SourceMaximum >= 100
. Same for dest.
cmd/autothrottle/main.go
Outdated
if replicatingNow.isSubSetOf(replicatingPreviously) { | ||
throttleMeta.DisableTopicUpdates() | ||
} else { | ||
throttleMeta.DisableTopicUpdates() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throttleMeta.DisableTopicUpdates() | |
throttleMeta.EnableTopicUpdates() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overview
Currently, autothrottle mimics the out of the box reassign-partitions throttle parameter in that a single value is assigned to both the inbound and outbound throttle rates for all brokers participating in a reassignment. The single rate value is determined by observing metrics data at each check interval and making a calculation based on the most loaded broker.
Internally, Kafka doesn't actually require that the inbound and outbound rates for a given broker are the same, nor does it require rates among brokers to be the same value. This PR creates a graph of the reassignment and observes broker utilization according to role; a "follower" in a replication is receiving data and a "leader" is sending data. A per-path rate is calculated according to the configured instance type capacity limit, an observation of current network utilization (accounting for any previously set throttles), and a ratio (separately configurable for inbound vs outbound flow) of free network capacity that's eligible for replication use. For each path, the rate is updated if it's more than n (configurable) distance from any previously set rates.
tl;dr this ensure that every replication path individually runs as fast as possible while adaptively maintaining capacity for consumers. Closes #81.
In Action