Retry with backoff when rate-limited #35

mars · 2018-10-23T23:37:34Z

Context

Heroku Platform API:

rate limits to 4500/requests/hour/account
sends current usage count in an HTTP response header:
```
Ratelimit-Remaining: 4499
```
returns HTTP status 429 when the account is rate limited.

Proposal

Let's make this client real friendly with HTTP status 429 response errors 🤗 to improve durability of the Terraform Provider.

Implement retry with exponential backoff when rate-limited with a new RoundTripWithRetryBackoff transport wrapping the default net/http RoundTrip().

Proposed (& now implemented) usage, note the Transport field:

heroku.NewService(&http.Client{
  Transport: &heroku.Transport{
    Username:          c.Email,
    Password:          c.APIKey,
    UserAgent:         heroku.DefaultUserAgent,
    AdditionalHeaders: c.Headers,
    Debug:             debugHTTP,
    Transport:         heroku.RoundTripWithRetryBackoff{
      // These are default values for ExponentialBackOff config
      InitialIntervalSeconds: 30,
      RandomizationFactor:    0.25,
      Multiplier:             2,
      MaxIntervalSeconds:     900,
      MaxElapsedTimeSeconds:  0,
    },
  },
})

The client reacts to the terminal HTTPS status 429 by retrying with back-off, instead of trying to predict and back-off before being rate-limited. Using the Ratelimit-Remaining header intelligently would seem to require global state shared across Terraform's parallel execution of nodes, so this proposal avoids that seemingly unnecessary complexity.

Fixes #32

sclasen · 2018-10-24T20:08:14Z

v3/round_trip_with_retry_backoff.go

+	var lastError error
+
+	retryableRoundTrip := func() error {
+		lastResponse, lastError = http.DefaultTransport.RoundTrip(req)


probably need to check lastError and return it if not nil?

lastError is checked just a few lines down in the parent function, because this backoff.Retry function should only return an error when it should be retried (status 429 for this rate-limiting use-case).

You could dry it up by dropping lastError and doing something like this:

retryableRoundTrip := func() (err error) { lastResponse, err = http.DefaultTransport.RoundTrip(req) // Detect Heroku API rate limiting // https://devcenter.heroku.com/articles/platform-api-reference#client-error-responses if lastResponse.StatusCode == 429 { err = fmt.Errorf("Heroku API rate limited: 429 Too Many Requests") } return err }

That would also mean that you would only need to handle the error once below...

But we only want this to retry for HTTP status 429 responses.

If that function returns other errors, it will retry for them too.

We only want to retry when rate-limited, right?

Oh yeah 😊

mars · 2018-10-24T23:18:03Z

I've verified that this retrier is doing the right thing in the Terraform Provider by switching this code to retry on to HTTP status 200, and then watch the logs during tests with TF_LOG=DEBUG to see it retrying.

mars · 2018-10-25T16:29:44Z

To test I revised the backoff.RetryNotify function:

retryableRoundTrip := func() error {
	lastResponse, lastError = http.DefaultTransport.RoundTrip(req)
	// Detect Heroku API rate limiting
	// https://devcenter.heroku.com/articles/platform-api-reference#client-error-responses
	if lastResponse.StatusCode == 429 {
		return fmt.Errorf("Heroku API rate limited: 429 Too Many Requests")
	}

	// Fake retry testing.
	if lastResponse.StatusCode == 200 {
		return fmt.Errorf("Fake retrier!!!")
	}

	return nil
}

…and can see the randomized backoff working in the logs:

2018/10/25 09:14:00 Will retry Heroku API request in 33.872687193s, because Fake retrier!!!
2018/10/25 09:14:35 Will retry Heroku API request in 46.371947138s, because Fake retrier!!!
2018/10/25 09:15:21 Will retry Heroku API request in 1m46.253268969s, because Fake retrier!!!
2018/10/25 09:17:08 Will retry Heroku API request in 4m0.01901876s, because Fake retrier!!!
2018/10/25 09:21:09 Will retry Heroku API request in 8m21.662152299s, because Fake retrier!!!

So, my confidence is building that this solution is good 😇

mars · 2018-11-05T18:13:02Z

I think this is ready to go!

The new retry functionality is not enabled by default. Transport: heroku.RoundTripWithRetryBackoff{} must be specifically set. So there's very little chance of this affecting existing usage.

Please review and let me know what you think.

talbright

LGTM, though I'm not clear on why Reset() has to be called immediately after the struct is created.

talbright · 2018-11-07T13:29:25Z

v3/round_trip_with_retry_backoff.go

+
+// net/http RoundTripper interface, a.k.a. Transport
+// https://godoc.org/net/http#RoundTripper
+type RoundTripWithRetryBackoff struct {


✅ good call -- was going to suggest in an earlier commit you allow some configuration to be passed through

talbright · 2018-11-07T13:30:14Z

v3/round_trip_with_retry_backoff.go

+		MaxInterval:         time.Duration(int64WithDefault(r.MaxIntervalSeconds, int64(900))) * time.Second,
+		MaxElapsedTime:      time.Duration(int64WithDefault(r.MaxElapsedTimeSeconds, int64(0))) * time.Second,
+	}
+	rateLimitRetryConfig.Reset()


Why did you need to do this? (https://godoc.org/github.com/cenkalti/backoff#ExponentialBackOff.Reset)

I followed the way the backoff module's built-in convenience constructor works:
https://github.com/cenkalti/backoff/blob/adb73d5bf0d9237fab19ff58aebf658449e326df/exponential.go#L92

Not sure it's required, but it seemed prudent to follow the creator's lead 😅

That sounds logical to me 👍

https://github.com/cenkalti/backoff/blob/adb73d5bf0d9237fab19ff58aebf658449e326df/exponential.go#L107-L108

mars · 2018-11-07T20:37:42Z

Thanks @talbright 😄

schneems · 2020-04-23T17:49:35Z

Hello rate throttlers! I've been poking at this problem of adding rate throttling to the platform-api gem for a bit.

I did some benchmarking and it looks like if one of the goals is to reduce the total number of requests to the origin server that an exponential backoff strategy doesn't help that much. In my benchmark it has an 80% retry rate. If the primary goal is to not have failures due to rate limiting, then it does a great job there. Here's my benchmark results:

https://github.com/zombocom/rate_throttle_client#ratethrottleclientexponentialbackoff-results-duration-300-minutes-multiplier-12

And here's some of my research and how I went about iterating on a solution with good properties that has a lower retry rate (~3% instead of 80%): https://github.com/schneems/rate_throttle_clients#how-to-write-a-rate-throttling-algorithm

Here's the platform-api PR heroku/platform-api#103.

There's no immediate action needed by y'all but if you're interested in various rate-throttling strategies, it's an active area of research/interest of mine.

mars · 2020-04-23T23:13:25Z

So thoughtful @schneems 🙌 As you intuited, the reason rate-throttling was implemented here was to avoid terraform apply failures.

mars added 2 commits October 23, 2018 16:32

Add backoff dependency

da4a536

WIP RoundTripWithRetryBackoff transport

176c52c

mars force-pushed the retry-with-backoff branch from efd52a4 to 176c52c Compare October 23, 2018 23:56

mars changed the title ~~Retry with backoff~~ Retry with backoff when rate-limited Oct 24, 2018

sclasen reviewed Oct 24, 2018

View reviewed changes

mars added 3 commits October 24, 2018 14:38

RoundTripWithRetryBackoff implements RoundTripper interface

3382225

Tune retry-with-backoff timing for Heroku API rate limiting

d782d8b

Log notices when retry-with-backoff is actaully used

8db66c1

mars referenced this pull request in heroku/terraform-provider-heroku Oct 25, 2018

Swap out dep for heroku repo in Gopkg.toml

8d00f73

mars mentioned this pull request Oct 25, 2018

Retry with backoff when rate-limited heroku/terraform-provider-heroku#135

Merged

mars force-pushed the retry-with-backoff branch from e3d6e78 to b2db372 Compare October 26, 2018 18:57

Configurable timing for RoundTripWithRetryBackoff

4a8c894

mars force-pushed the retry-with-backoff branch from b2db372 to 4a8c894 Compare October 26, 2018 21:54

talbright approved these changes Nov 7, 2018

View reviewed changes

mars merged commit 3b96879 into heroku:master Nov 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry with backoff when rate-limited #35

Retry with backoff when rate-limited #35

mars commented Oct 23, 2018 •

edited

Loading

sclasen Oct 24, 2018 •

edited

Loading

mars Oct 24, 2018

talbright Oct 24, 2018 •

edited

Loading

mars Oct 24, 2018

talbright Oct 24, 2018

mars commented Oct 24, 2018 •

edited

Loading

mars commented Oct 25, 2018

mars commented Nov 5, 2018

talbright left a comment

talbright Nov 7, 2018

talbright Nov 7, 2018

mars Nov 7, 2018 •

edited

Loading

talbright Nov 7, 2018

talbright Nov 7, 2018

mars commented Nov 7, 2018

schneems commented Apr 23, 2020

mars commented Apr 23, 2020

Retry with backoff when rate-limited #35

Retry with backoff when rate-limited #35

Conversation

mars commented Oct 23, 2018 • edited Loading

Context

Proposal

sclasen Oct 24, 2018 • edited Loading

Choose a reason for hiding this comment

mars Oct 24, 2018

Choose a reason for hiding this comment

talbright Oct 24, 2018 • edited Loading

Choose a reason for hiding this comment

mars Oct 24, 2018

Choose a reason for hiding this comment

talbright Oct 24, 2018

Choose a reason for hiding this comment

mars commented Oct 24, 2018 • edited Loading

mars commented Oct 25, 2018

mars commented Nov 5, 2018

talbright left a comment

Choose a reason for hiding this comment

talbright Nov 7, 2018

Choose a reason for hiding this comment

talbright Nov 7, 2018

Choose a reason for hiding this comment

mars Nov 7, 2018 • edited Loading

Choose a reason for hiding this comment

talbright Nov 7, 2018

Choose a reason for hiding this comment

talbright Nov 7, 2018

Choose a reason for hiding this comment

mars commented Nov 7, 2018

schneems commented Apr 23, 2020

mars commented Apr 23, 2020

mars commented Oct 23, 2018 •

edited

Loading

sclasen Oct 24, 2018 •

edited

Loading

talbright Oct 24, 2018 •

edited

Loading

mars commented Oct 24, 2018 •

edited

Loading

mars Nov 7, 2018 •

edited

Loading