-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry with backoff when rate-limited #35
Conversation
efd52a4
to
176c52c
Compare
var lastError error | ||
|
||
retryableRoundTrip := func() error { | ||
lastResponse, lastError = http.DefaultTransport.RoundTrip(req) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably need to check lastError and return it if not nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lastError
is checked just a few lines down in the parent function, because this backoff.Retry
function should only return an error when it should be retried (status 429 for this rate-limiting use-case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could dry it up by dropping lastError
and doing something like this:
retryableRoundTrip := func() (err error) {
lastResponse, err = http.DefaultTransport.RoundTrip(req)
// Detect Heroku API rate limiting
// https://devcenter.heroku.com/articles/platform-api-reference#client-error-responses
if lastResponse.StatusCode == 429 {
err = fmt.Errorf("Heroku API rate limited: 429 Too Many Requests")
}
return err
}
That would also mean that you would only need to handle the error once below...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we only want this to retry for HTTP status 429 responses.
If that function returns other errors, it will retry for them too.
We only want to retry when rate-limited, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah 😊
I've verified that this retrier is doing the right thing in the Terraform Provider by switching this code to retry on to HTTP status 200, and then watch the logs during tests with |
To test I revised the retryableRoundTrip := func() error {
lastResponse, lastError = http.DefaultTransport.RoundTrip(req)
// Detect Heroku API rate limiting
// https://devcenter.heroku.com/articles/platform-api-reference#client-error-responses
if lastResponse.StatusCode == 429 {
return fmt.Errorf("Heroku API rate limited: 429 Too Many Requests")
}
// Fake retry testing.
if lastResponse.StatusCode == 200 {
return fmt.Errorf("Fake retrier!!!")
}
return nil
} …and can see the randomized backoff working in the logs:
So, my confidence is building that this solution is good 😇 |
e3d6e78
to
b2db372
Compare
b2db372
to
4a8c894
Compare
I think this is ready to go! The new retry functionality is not enabled by default. Please review and let me know what you think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, though I'm not clear on why Reset()
has to be called immediately after the struct is created.
|
||
// net/http RoundTripper interface, a.k.a. Transport | ||
// https://godoc.org/net/http#RoundTripper | ||
type RoundTripWithRetryBackoff struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ good call -- was going to suggest in an earlier commit you allow some configuration to be passed through
MaxInterval: time.Duration(int64WithDefault(r.MaxIntervalSeconds, int64(900))) * time.Second, | ||
MaxElapsedTime: time.Duration(int64WithDefault(r.MaxElapsedTimeSeconds, int64(0))) * time.Second, | ||
} | ||
rateLimitRetryConfig.Reset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you need to do this? (https://godoc.org/github.com/cenkalti/backoff#ExponentialBackOff.Reset)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed the way the backoff module's built-in convenience constructor works:
https://github.com/cenkalti/backoff/blob/adb73d5bf0d9237fab19ff58aebf658449e326df/exponential.go#L92
Not sure it's required, but it seemed prudent to follow the creator's lead 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds logical to me 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @talbright 😄 |
Hello rate throttlers! I've been poking at this problem of adding rate throttling to the I did some benchmarking and it looks like if one of the goals is to reduce the total number of requests to the origin server that an exponential backoff strategy doesn't help that much. In my benchmark it has an 80% retry rate. If the primary goal is to not have failures due to rate limiting, then it does a great job there. Here's my benchmark results: And here's some of my research and how I went about iterating on a solution with good properties that has a lower retry rate (~3% instead of 80%): https://github.com/schneems/rate_throttle_clients#how-to-write-a-rate-throttling-algorithm Here's the There's no immediate action needed by y'all but if you're interested in various rate-throttling strategies, it's an active area of research/interest of mine. |
So thoughtful @schneems 🙌 As you intuited, the reason rate-throttling was implemented here was to avoid |
Context
Heroku Platform API:
rate limits to 4500/requests/hour/account
sends current usage count in an HTTP response header:
returns HTTP status 429 when the account is rate limited.
Proposal
Let's make this client real friendly with HTTP status 429 response errors 🤗 to improve durability of the Terraform Provider.
Implement retry with exponential backoff when rate-limited with a new
RoundTripWithRetryBackoff
transport wrapping the default net/httpRoundTrip()
.Proposed (& now implemented) usage, note the
Transport
field:The client reacts to the terminal HTTPS status 429 by retrying with back-off, instead of trying to predict and back-off before being rate-limited. Using the
Ratelimit-Remaining
header intelligently would seem to require global state shared across Terraform's parallel execution of nodes, so this proposal avoids that seemingly unnecessary complexity.Fixes #32