roundTripper.timedRoundTrip timeout cancels legit but slow requests #195

domdom82 · 2021-02-01T15:15:49Z

Is this a security vulnerability?

no

Issue

The implementation of RoundTrip as defined in the RoundTripper interface in gorouter cancels requests mid-flight.
The main problem I see is the use of context.WithTimeout at https://github.com/cloudfoundry/gorouter/blob/main/proxy/round_tripper/proxy_round_tripper.go#L261

I think the main intention of this timeout is to cancel requests on unresponsive backends and to close the connection when the timeout expires. However, the current implementation using context.WithTimeout on the incoming request makes no difference between an unresponsive backend and a responding but slow backend. This means, by default there can be no request that takes more than 15 minutes (900s) to finish.

To me this feels odd because having an app for larger payloads like db backups that take longer than 15m to download are simply not possible out of the box. Part of this issue is for me to understand if that is the intention of this timeout or rather something along the lines of an actual "request timeout" (i.e. the time a backend can take before sending response headers) because such a timeout already exists. It is defined in http.Transport at https://github.com/golang/go/blob/master/src/net/http/transport.go#L217

Affected Versions

all including 0.211.0

Context

I simulated a slow app that sends a larger document at a very slow rate (1 MB at 1 KB/s). The response should take about 1000 seconds to download but it is disrupted at 900s with the above timeout and the following error message:

Ne quos utrum ore tu hi rogo spectandum aut terrae ne aeger sat.Me sapientiae inter ita fortitudinem in 
* TLSv1.2 (IN), TLS alert, close notify (256):
{ [2 bytes data]
* transfer closed with outstanding read data remaining
* Closing connection 0
* TLSv1.2 (OUT), TLS alert, close notify (256):
} [2 bytes data]
spectandum sequi subduntur me pati quae psalmi.

(I used lorem ipsum to generate fake data)

Steps to Reproduce

Deploy 2 apps "delayed" and "slow". delayed app should respond after 1000s. Slow should send data for 1000s.
Curl each app

Expected result

The "delayed" app request should get cancelled after 900s.
The "slow" app request should finish.

Current result

Requests of both apps get cancelled after 900s.

Possible Fix

ProxyRoundTripper should instead use the ResponseHeaderTimeout like Transport as implemented here: https://github.com/golang/go/blob/master/src/net/http/transport.go#L2524

Additional Context

I can provide test apps if needed. I can also implement this as a PR if the community agrees this should be fixed.

The text was updated successfully, but these errors were encountered:

cf-gitbot · 2021-02-01T15:15:51Z

We have created an issue in Pivotal Tracker to manage this. Unfortunately, the Pivotal Tracker project is private so you may be unable to view the contents of the story.

The labels on this github issue will be updated when the story is started.

selzoc · 2021-02-16T18:52:14Z

Hello! Are you aware that you can configure the timeout value here via

routing-release/jobs/gorouter/spec

Lines 321 to 328 in 3141418

    
           request_timeout_in_seconds: 
        
             description: | 
        
               This configures a "request timeout" and a "backend idle timeout". 
        
               Requests from router to backend endpoints that are longer than this duration will be canceled and logged as 
        
               `backend-request-timeout` errors. In addition, TCP connections between router and backend endpoints that 
        
               are idle for longer than this duration will be closed. Related properties: `router.max_idle_connections` 
        
               and `router.keep_alive_probe_interval`. 
        
             default: 900

?

domdom82 · 2021-02-16T20:47:23Z

Hi @selzoc yes I know that. My argument is that the timeout is currently implemented in a way that imho does not fulfill the notion of a "request timeout". It does limit the total time a request-response pair can take, not how long a backend can take to respond to a request. So it is more of a semantic issue than one about the value of the timeout. Limiting the total time a response can take makes little sense to me, because there are use cases (db backup, upload large files, stream video etc.) that basically have an open end to how long they can take. So a timeout to me makes sense only as a "time until server responds" thing.

selzoc · 2021-02-25T23:32:45Z

Hi @domdom82, we discussed this as a team and don't think we would want to modify the current behavior. In the gorouter's job as a reverse proxy, we want to make sure that the # of connections are fairly distributed amongst backends, and this is a coarse mechanism for making sure that the pool can't be entirely taken up by long-running connections past a certain timeout.

cf-gitbot added the unscheduled label Feb 1, 2021

selzoc closed this as completed Feb 25, 2021

cf-gitbot removed the unscheduled label Feb 25, 2021

Gerg mentioned this issue Mar 31, 2021

Intent to support HTTP/2 #200

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

roundTripper.timedRoundTrip timeout cancels legit but slow requests #195

roundTripper.timedRoundTrip timeout cancels legit but slow requests #195

domdom82 commented Feb 1, 2021 •

edited

Loading

cf-gitbot commented Feb 1, 2021

selzoc commented Feb 16, 2021

domdom82 commented Feb 16, 2021 •

edited

Loading

selzoc commented Feb 25, 2021

roundTripper.timedRoundTrip timeout cancels legit but slow requests #195

roundTripper.timedRoundTrip timeout cancels legit but slow requests #195

Comments

domdom82 commented Feb 1, 2021 • edited Loading

Is this a security vulnerability?

Issue

Affected Versions

Context

Steps to Reproduce

Expected result

Current result

Possible Fix

Additional Context

cf-gitbot commented Feb 1, 2021

selzoc commented Feb 16, 2021

domdom82 commented Feb 16, 2021 • edited Loading

selzoc commented Feb 25, 2021

domdom82 commented Feb 1, 2021 •

edited

Loading

domdom82 commented Feb 16, 2021 •

edited

Loading