Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roundTripper.timedRoundTrip timeout cancels legit but slow requests #195

Closed
domdom82 opened this issue Feb 1, 2021 · 4 comments
Closed

Comments

@domdom82
Copy link
Contributor

domdom82 commented Feb 1, 2021

Is this a security vulnerability?

no

Issue

The implementation of RoundTrip as defined in the RoundTripper interface in gorouter cancels requests mid-flight.
The main problem I see is the use of context.WithTimeout at https://github.com/cloudfoundry/gorouter/blob/main/proxy/round_tripper/proxy_round_tripper.go#L261

I think the main intention of this timeout is to cancel requests on unresponsive backends and to close the connection when the timeout expires. However, the current implementation using context.WithTimeout on the incoming request makes no difference between an unresponsive backend and a responding but slow backend. This means, by default there can be no request that takes more than 15 minutes (900s) to finish.

To me this feels odd because having an app for larger payloads like db backups that take longer than 15m to download are simply not possible out of the box. Part of this issue is for me to understand if that is the intention of this timeout or rather something along the lines of an actual "request timeout" (i.e. the time a backend can take before sending response headers) because such a timeout already exists. It is defined in http.Transport at https://github.com/golang/go/blob/master/src/net/http/transport.go#L217

Affected Versions

all including 0.211.0

Context

I simulated a slow app that sends a larger document at a very slow rate (1 MB at 1 KB/s). The response should take about 1000 seconds to download but it is disrupted at 900s with the above timeout and the following error message:

Ne quos utrum ore tu hi rogo spectandum aut terrae ne aeger sat.Me sapientiae inter ita fortitudinem in 
* TLSv1.2 (IN), TLS alert, close notify (256):
{ [2 bytes data]
* transfer closed with outstanding read data remaining
* Closing connection 0
* TLSv1.2 (OUT), TLS alert, close notify (256):
} [2 bytes data]
spectandum sequi subduntur me pati quae psalmi.

(I used lorem ipsum to generate fake data)

Steps to Reproduce

  1. Deploy 2 apps "delayed" and "slow". delayed app should respond after 1000s. Slow should send data for 1000s.
  2. Curl each app

Expected result

  1. The "delayed" app request should get cancelled after 900s.
  2. The "slow" app request should finish.

Current result

  1. Requests of both apps get cancelled after 900s.

Possible Fix

ProxyRoundTripper should instead use the ResponseHeaderTimeout like Transport as implemented here: https://github.com/golang/go/blob/master/src/net/http/transport.go#L2524

Additional Context

I can provide test apps if needed. I can also implement this as a PR if the community agrees this should be fixed.

@cf-gitbot
Copy link

We have created an issue in Pivotal Tracker to manage this. Unfortunately, the Pivotal Tracker project is private so you may be unable to view the contents of the story.

The labels on this github issue will be updated when the story is started.

@selzoc
Copy link
Member

selzoc commented Feb 16, 2021

Hello! Are you aware that you can configure the timeout value here via

request_timeout_in_seconds:
description: |
This configures a "request timeout" and a "backend idle timeout".
Requests from router to backend endpoints that are longer than this duration will be canceled and logged as
`backend-request-timeout` errors. In addition, TCP connections between router and backend endpoints that
are idle for longer than this duration will be closed. Related properties: `router.max_idle_connections`
and `router.keep_alive_probe_interval`.
default: 900
?

@domdom82
Copy link
Contributor Author

domdom82 commented Feb 16, 2021

Hi @selzoc yes I know that. My argument is that the timeout is currently implemented in a way that imho does not fulfill the notion of a "request timeout". It does limit the total time a request-response pair can take, not how long a backend can take to respond to a request. So it is more of a semantic issue than one about the value of the timeout. Limiting the total time a response can take makes little sense to me, because there are use cases (db backup, upload large files, stream video etc.) that basically have an open end to how long they can take. So a timeout to me makes sense only as a "time until server responds" thing.

@selzoc
Copy link
Member

selzoc commented Feb 25, 2021

Hi @domdom82, we discussed this as a team and don't think we would want to modify the current behavior. In the gorouter's job as a reverse proxy, we want to make sure that the # of connections are fairly distributed amongst backends, and this is a coarse mechanism for making sure that the pool can't be entirely taken up by long-running connections past a certain timeout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants