-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/net/http2: retry requests rejected with REFUSED_STREAM #20985
Comments
Looks like this is another HTTP2 problem on GCS almost reminiscent of #20689 perhaps |
This isn't a GFE bug implementing HTTP/2 wrong. This is GCS (via the GFE) telling us to slow down it seems. But I can't reproduce. Maybe their rate limiting likes me more. Oh, if I crank it up 10x I can get the error. With
It's not clear what the GFE wants us to do, though: if it didn't want us to open 100 concurrent streams, why did it tell us to?
So, okay, maybe it doesn't like something else, like the rate (which isn't expressible as a SETTING), so it sends us a REFUSED_STREAM instead. What are we supposed to do? Retry in a loop forever? With some exponential backoff I guess? Yeah, http://httpwg.org/specs/rfc7540.html#Reliability says:
So, yeah, do that. /cc @tombergan |
Do I understand correctly that you are well below |
@Capstan, that's what it seems like, but I could verify more. I didn't analyze the full log and count. Go's net/http client does respect the |
We're facing the exact same error in production as well when calling out to GCS. A robust retry logic inside the Go stdlib would indeed be awesome and surely prevent having some wonky code in every client library or application code doing this manually. 🙂 |
I have two questions:
|
I'd do the first retry immediately, and then with some backoff, up to a max number of retries total. And once we get a REFUSED_STREAM on a conn, mark that conn as unsuitable for any future streams as well and put it into shutdown mode.
There's an existing bug for that. |
That retry strategy SGTM. I'll use a fixed limit of 6 retries (unless you object), using Firefox as prior art.
This should not be necessary ... the server can send GOAWAY if they want us to go away entirely.
Ah yes, #13774. |
I want to do it anyway. It also means we're forced to open a new TCP connection and get a different backend. Otherwise I worry that our loop will run and we'll select the same TCP connection that just told us REFUSED_STREAM. I want a better chance at being reassigned to a different backend. See also: #20977 (comment) |
CL https://golang.org/cl/50471 mentions this issue. |
From ietf-http-wg, retrying on a new connection is not necessary, and again, Go would be the only client I'm aware of that would do this. Another client that retries REFUSED_STREAM on the same connection (add to Firefox and Chrome): |
We default to MAX_CONCURRENT_STREAMS=1000, then get a SETTINGS frame with MAX_CONCURRENT_STREAMS=100 from the GFE. Is it possible that we manage to send 101 requests in the initial TCP congestion window before we get the SETTINGS from the GFE? That would explain REFUSED_STREAM. (I checked the GFE code. It looks like the GFE sends REFUSED_STREAM in two cases: exceeded MAX_CONCURRENT_STREAMS or the GFE itself is overloaded. The latter case seems unlikely, so it's likely that we're exceeding MAX_CONCURRENT_STREAMS.) If that is the case, we can't entirely fix this bug without fixing #13774. |
Deleted my prior comment because I was wrong, this is exactly what's happening: We default to MAX_CONCURRENT_STREAMS=1000. The GFE sends MAX_CONCURRENT_STREAMS=100. However, we create multiple HTTP/2 connections (I count 5). On one of those connections, we manage to send 200+ requests before actually reading the SETTINGS from the GFE. I think the right fix for this bug is to backoff and retry, along with a fix for #13774. |
Change https://golang.org/cl/53250 mentions this issue: |
Also see internal bug number 64069455. I believe the two changes attached to this issue should fix the problem, although I have not yet tried the original repro with both changes patched at the same time. Edit: more correctly, those two changes, plus the change to bundle them into net/http. |
RoundTrip will retry a request if it receives REFUSED_STREAM. To guard against servers that use REFUSED_STREAM to encourage rate limiting, or servers that return REFUSED_STREAM deterministically for some requests, we retry after an exponential backoff and we cap the number of retries. The exponential backoff starts on the second retry, with a backoff sequence of 1s, 2s, 4s, etc, with 10% random jitter. The retry cap was set to 6, somewhat arbitrarily. Rationale: this is what Firefox does. Updates golang/go#20985 Change-Id: I4dcac4392ac4a3220d6d839f28bf943fe6b3fea7 Reviewed-on: https://go-review.googlesource.com/50471 Run-TryBot: Tom Bergan <tombergan@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Currently if the http2.Transport hits SettingsMaxConcurrentStreams for a server, it just makes a new TCP connection and creates the stream on the new connection. This CL updates that behavior to instead block RoundTrip until a new stream is available. I also fixed a second bug, which was necessary to make some tests pass: Previously, a stream was removed from cc.streams only if either (a) we received END_STREAM from the server, or (b) we received RST_STREAM from the server. This CL removes a stream from cc.streams if the request was cancelled (via ctx.Close, req.Cancel, or resp.Body.Close) before receiving END_STREAM or RST_STREAM from the server. Updates golang/go#13774 Updates golang/go#20985 Updates golang/go#21229 Change-Id: I660ffd724c4c513e0f1cc587b404bedb8aff80be Reviewed-on: https://go-review.googlesource.com/53250 Run-TryBot: Tom Bergan <tombergan@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Change https://golang.org/cl/54052 mentions this issue: |
This should be fixed in the main repo (for Go 1.10). Please reopen if you are still seeing problems. |
RoundTrip will retry a request if it receives REFUSED_STREAM. To guard against servers that use REFUSED_STREAM to encourage rate limiting, or servers that return REFUSED_STREAM deterministically for some requests, we retry after an exponential backoff and we cap the number of retries. The exponential backoff starts on the second retry, with a backoff sequence of 1s, 2s, 4s, etc, with 10% random jitter. The retry cap was set to 6, somewhat arbitrarily. Rationale: this is what Firefox does. Updates golang/go#20985 Change-Id: I4dcac4392ac4a3220d6d839f28bf943fe6b3fea7 Reviewed-on: https://go-review.googlesource.com/50471 Run-TryBot: Tom Bergan <tombergan@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Currently if the http2.Transport hits SettingsMaxConcurrentStreams for a server, it just makes a new TCP connection and creates the stream on the new connection. This CL updates that behavior to instead block RoundTrip until a new stream is available. I also fixed a second bug, which was necessary to make some tests pass: Previously, a stream was removed from cc.streams only if either (a) we received END_STREAM from the server, or (b) we received RST_STREAM from the server. This CL removes a stream from cc.streams if the request was cancelled (via ctx.Close, req.Cancel, or resp.Body.Close) before receiving END_STREAM or RST_STREAM from the server. Updates golang/go#13774 Updates golang/go#20985 Updates golang/go#21229 Change-Id: I660ffd724c4c513e0f1cc587b404bedb8aff80be Reviewed-on: https://go-review.googlesource.com/53250 Run-TryBot: Tom Bergan <tombergan@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
What version of Go are you using (
go version
)?go1.8.1.typealias
What operating system and processor architecture are you using (
go env
)?Linux amd64
What did you do?
Where refused-stream.go is https://gist.github.com/jba/9e75c3aedeb4e8b98a323424283fae88.
(veener-jba-doc-test-bucket/datastore should be publicly readable, but I'm guessing any object would do.)
What did you expect to see?
No output.
What did you see instead?
Often (not always), it fails with
The text was updated successfully, but these errors were encountered: