Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Http connections aborted after 5s / keepAliveTimeout #13391

Closed
pbininda opened this issue Jun 2, 2017 · 37 comments
Closed

Http connections aborted after 5s / keepAliveTimeout #13391

pbininda opened this issue Jun 2, 2017 · 37 comments
Assignees
Labels
http Issues or PRs related to the http subsystem.

Comments

@pbininda
Copy link
Contributor

pbininda commented Jun 2, 2017

  • Version: v8.0.0
  • Platform: Windows 10, 64bit
  • Subsystem: http

Short Description

Node 8 introduced a change in the handling of http keep-alive connections. IMHO, this is (at least) a breaking change. When an http server does long-running requests (>5s), and the client requests a Connection: keep-alive connection, the http server closes the connection after 5s. This potentially causes browsers to re-send the request even if it is a POST request.

To Reproduce

clone https://github.com/pbininda/node8keepAliveTimeout and npm install. Then

    npm test

Starts a little express server (server.js) and a client.

  • The server is a standard express server with a long running post request (/longpost takes 10s).
  • The client calls the POST /longpost with a preflight OPTIONS /longpost.

The test runs through fine on node 6 and node 7:

> node test.js

got request OPTIONS /longpost
got options response 200
sending post request
got request POST /longpost
got post response 200 { status: 'OK' }

but fails on node 8 with

> node test.js

got request OPTIONS /longpost
got options response 200
sending post request
got request POST /longpost
C:\Users\pbininda\projects\ATRON\node8keepAliveTimeout\client.js:39
            throw err;
            ^

Error: socket hang up
    at createHangUpError (_http_client.js:343:15)
    at Socket.socketOnEnd (_http_client.js:435:23)
    at emitNone (events.js:110:20)
    at Socket.emit (events.js:207:7)
    at endReadableNT (_stream_readable.js:1045:12)
    at _combinedTickCallback (internal/process/next_tick.js:102:11)
    at process._tickCallback (internal/process/next_tick.js:161:9)

Browser Retries

It seems, most of the major browsers (Chrome, Firefox, Edge) implement https://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html#sec8.2.4. Since the server closes the connection on which it received the POST request before sending an answer, the Browsers re-send the POST. Note that you don't see the re-send in chrome dev tools but using Wireshark shows the retransmission. To have a look at this, run

    npm start

which launches the server (server.js) and then load browsertest.html in chrome. This runs browsertest.js in the browser which does a simple $.ajax request against the server. On the server side you will see:

> node server.js

got request OPTIONS /longpost
got request POST /longpost
got request POST /longpost
format called 5003ms after previous

This shows, that the server received two POST requests the second one 5s after the first one, even though the browser client code only does one request.

Bug or Breaking Change?

I'm not sure if this is a bug or a breaking change. It probably got introduced through #2534. It only seems to happen when two connections are used (that's why the prefight OPTIONS is forced to happen in my code), so it may be that the wrong connection is being closed here.

Workaround

Setting the keepAliveTimeout (see https://nodejs.org/dist/latest-v8.x/docs/api/http.html#http_server_keepalivetimeout) of the http server to a value greater than the maximum duration of a request solves the problem. You can try this with

    npm start -- --keepAliveTimeout 20000

and then in another terminal

    node client.js
@ChALkeR
Copy link
Member

ChALkeR commented Jun 2, 2017

/cc @indutny @tshemsedinov @aqrln
Could be related to #2534.

@ChALkeR ChALkeR added the http Issues or PRs related to the http subsystem. label Jun 2, 2017
@aqrln
Copy link
Contributor

aqrln commented Jun 2, 2017

Thanks for the ping, I'll be able look into this in a few hours.

@aqrln aqrln self-assigned this Jun 2, 2017
@lvpro
Copy link

lvpro commented Jun 2, 2017

Thanks for bringing this up @pbininda. We hit this issue as well when running requests longer than a few seconds. We also see multiple POSTs effecting all browsers on 8 ... reverted back to 7.10 and all is well. Seems like a pretty pernicious bug. We're using Koa 1.x middleware on Linux.

@juanecabellob
Copy link

juanecabellob commented Jun 2, 2017

Same behaviour here; request bit longer than usual and the browser kept resending the request. A general remark is that Firefox and Chrome did resend the request, but Safari didn't. Using Express server 4.15.2. Finally, we reverted to 7.10 and is working normally.

@pbininda
Copy link
Contributor Author

pbininda commented Jun 3, 2017

Thanks for the note regarding Safari, I'll change the wording regarding "all major browsers" 😓

@aqrln
Copy link
Contributor

aqrln commented Jun 5, 2017

Ugh... sorry for the delay, I was more busy that I hoped for. Let's fix it today. Thanks a lot for a detailed report and reproduction!

@lvpro
Copy link

lvpro commented Jun 8, 2017

Confirmed issue still present in release 8.1 as well.

@aqrln
Copy link
Contributor

aqrln commented Jun 8, 2017

@pbininda @lvpro @juanecabellob I'm very sorry for not making it in time for the 8.1 release. I took a look at the reproduction back then, but didn't have an opportunity to debug it until now. #13549 should fix it.

@lvpro
Copy link

lvpro commented Jun 8, 2017

Don't be sorry @aqrln! Thank you very much for getting this resolved! :)

@pbininda
Copy link
Contributor Author

pbininda commented Jun 8, 2017

@aqrln No problem, I can keep the server.keepAliveTimeout workaround in place until the fix is released. Thanks for your effort.

aqrln added a commit to aqrln/node that referenced this issue Jun 12, 2017
Fix the logic of resetting the socket timeout of keep-alive HTTP
connections and add two tests:

* `test-http-server-keep-alive-timeout-slow-server` is a regression test
  for nodejsGH-13391.  It ensures that the server-side keep-alive timeout will
  not fire during processing of a request.

* `test-http-server-keep-alive-timeout-slow-headers` ensures that the
  regular socket timeout is restored as soon as a client starts sending
  a new request, not as soon as the whole message is received, so that
  the keep-alive timeout will not fire while, e.g., the client is
  sending large cookies.

Refs: nodejs#2534
Fixes: nodejs#13391
@aqrln aqrln closed this as completed in d71718d Jun 13, 2017
MylesBorins pushed a commit that referenced this issue Jun 13, 2017
Fix the logic of resetting the socket timeout of keep-alive HTTP
connections and add two tests:

* `test-http-server-keep-alive-timeout-slow-server` is a regression test
  for GH-13391.  It ensures that the server-side keep-alive timeout will
  not fire during processing of a request.

* `test-http-server-keep-alive-timeout-slow-client-headers` ensures that
  the regular socket timeout is restored as soon as a client starts
  sending a new request, not as soon as the whole message is received,
  so that the keep-alive timeout will not fire while, e.g., the client
  is sending large cookies.

Refs: #2534
Fixes: #13391
PR-URL: #13549
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Brian White <mscdex@mscdex.net>
MylesBorins pushed a commit that referenced this issue Jun 13, 2017
Fix the logic of resetting the socket timeout of keep-alive HTTP
connections and add two tests:

* `test-http-server-keep-alive-timeout-slow-server` is a regression test
  for GH-13391.  It ensures that the server-side keep-alive timeout will
  not fire during processing of a request.

* `test-http-server-keep-alive-timeout-slow-client-headers` ensures that
  the regular socket timeout is restored as soon as a client starts
  sending a new request, not as soon as the whole message is received,
  so that the keep-alive timeout will not fire while, e.g., the client
  is sending large cookies.

Refs: #2534
Fixes: #13391
PR-URL: #13549
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Brian White <mscdex@mscdex.net>
@awb99
Copy link

awb99 commented Oct 15, 2017

I have this issue too with webpack-dev-server. It somehow does not work.

How would I pass the paratmeter keepAlive Timeout so that the default timeout is 20 secs?

Above this is the example

npm start -- --keepAliveTimeout 20000

I tried this:

 npm run-script start -- --"keepAliveTimeout 20000"

It gets expanded to:

webpack-dev-server --inline --hot --history-api-fallback  --public --host 104.222.96.51 --port 9090 "--keepAliveTimeout 20000"

Unfortunately it does not have any effect.

@apapirovski
Copy link
Member

@MylesBorins You'll need the other commit: aaf2a1c — they probably should've landed as one or I should've indicated that the 2nd one depends on the first working properly. My bad.

@MylesBorins
Copy link
Contributor

@apapirovski done

landed in eeef06a

jasnell pushed a commit that referenced this issue Dec 28, 2017
Documenting the best way to imitate the old behavior saves time for people
migrating from older versions. (E.g. for unexpected ECONNRESET)

It isn't immediately obvious if earlier nodejs versions behaved the same
way as nodejs 8 does with keepAliveTimeout = 0.
From 0aa7ef5, it seems like they behave
the same way.

Related to issues such as #13391 that show up when migrating to node 8

PR-URL: #17660
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Anatoli Papirovski <apapirovski@mac.com>
MylesBorins pushed a commit that referenced this issue Jan 8, 2018
Documenting the best way to imitate the old behavior saves time for people
migrating from older versions. (E.g. for unexpected ECONNRESET)

It isn't immediately obvious if earlier nodejs versions behaved the same
way as nodejs 8 does with keepAliveTimeout = 0.
From 0aa7ef5, it seems like they behave
the same way.

Related to issues such as #13391 that show up when migrating to node 8

PR-URL: #17660
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Anatoli Papirovski <apapirovski@mac.com>
MylesBorins pushed a commit that referenced this issue Jan 9, 2018
Documenting the best way to imitate the old behavior saves time for people
migrating from older versions. (E.g. for unexpected ECONNRESET)

It isn't immediately obvious if earlier nodejs versions behaved the same
way as nodejs 8 does with keepAliveTimeout = 0.
From 0aa7ef5, it seems like they behave
the same way.

Related to issues such as #13391 that show up when migrating to node 8

PR-URL: #17660
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Anatoli Papirovski <apapirovski@mac.com>
MylesBorins pushed a commit that referenced this issue Jan 9, 2018
Documenting the best way to imitate the old behavior saves time for people
migrating from older versions. (E.g. for unexpected ECONNRESET)

It isn't immediately obvious if earlier nodejs versions behaved the same
way as nodejs 8 does with keepAliveTimeout = 0.
From 0aa7ef5, it seems like they behave
the same way.

Related to issues such as #13391 that show up when migrating to node 8

PR-URL: #17660
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Anatoli Papirovski <apapirovski@mac.com>
ribizli referenced this issue in apollographql/apollo-server Jan 10, 2018
gibfahn pushed a commit that referenced this issue Jan 24, 2018
Documenting the best way to imitate the old behavior saves time for people
migrating from older versions. (E.g. for unexpected ECONNRESET)

It isn't immediately obvious if earlier nodejs versions behaved the same
way as nodejs 8 does with keepAliveTimeout = 0.
From 0aa7ef5, it seems like they behave
the same way.

Related to issues such as #13391 that show up when migrating to node 8

PR-URL: #17660
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Anatoli Papirovski <apapirovski@mac.com>
rdallman added a commit to fnproject/fdk-node that referenced this issue Apr 16, 2019
details of bug are here: nodejs/node#13391
docs here: https://nodejs.org/dist/latest-v8.x/docs/api/http.html#http_server_keepalivetimeout

tl;dr is node set a 5s idle timeout. this does what go and java both seem to
do (have not checked python or ruby, though from what I know about python,
we are not doing this either) and doesn't have an idle timeout. since in this
case we do kinda trust that the client is using an idle timeout (it's fn),
this seems like the right policy anyway (if fn dies is something to consider,
but the least of our worries is fdk conns in that case, and we are killing fn
spawned containers on startup too). it should also be noted the client (fn) is
only using 1 conn per container.
rdallman added a commit to fnproject/fdk-node that referenced this issue Apr 16, 2019
details of bug are here: nodejs/node#13391
docs here: https://nodejs.org/dist/latest-v8.x/docs/api/http.html#http_server_keepalivetimeout

tl;dr is node set a 5s idle timeout. this does what go and java both seem to
do (have not checked python or ruby, though from what I know about python,
we are not doing this either) and doesn't have an idle timeout. since in this
case we do kinda trust that the client is using an idle timeout (it's fn),
this seems like the right policy anyway (if fn dies is something to consider,
but the least of our worries is fdk conns in that case, and we are killing fn
spawned containers on startup too). it should also be noted the client (fn) is
only using 1 conn per container.
@tony-gutierrez
Copy link

tony-gutierrez commented May 10, 2019

Would this have been present in node 10.6.0? Test code above seems to pass.

@janswist
Copy link

3 years and bug still exists? How's that possible?

@jasnell
Copy link
Member

jasnell commented Mar 25, 2020

How's that possible?

As with all things in Node.js, it requires someone free to work on it. Pull requests are always welcome. One thing that may be helpful is a reproduction of the issue in the form of a known_issue test that can be used to guide someone in making a fix.

@janswist
Copy link

janswist commented Mar 25, 2020

How's that possible?

As with all things in Node.js, it requires someone free to work on it. Pull requests are always welcome. One thing that may be helpful is a reproduction of the issue in the form of a known_issue test that can be used to guide someone in making a fix.

Not intended to sound rude. I'm trying to solve that riddle for hours now - maybe someone else has it as well:

  • trying to send in response an 8KB array of 50 items that takes 6.7 seconds to perform [ERROR]
  • when I slice(10) it suddenly takes 2 seconds and everything works like a charm.

My question would be: how is that possible? Seems like if send data is too big (?) it just freezes and then times out.

I'm using Node 10.19.0. Thanks for your help.

@uri-chandler
Copy link

uri-chandler commented Nov 25, 2020

In case this helps - I think there's an additional use case which isn't related to the transmission size, but rather it's a race condition between the start / end of requests on the same connection.

  1. First request, new connection opened, keep-alive timer starts.
  2. First request ends
  3. Keep-alive timer kicks in, "timeout" event is about to be emitted on the socket
    3.1 First bytes of second request come in on the soon-to-be-destroyed socket
  4. "timeout" event is emitted on the socket, resulting in the destruction of the socket
    4.1 Second request gets dropped
  5. Second requests parsing (parseOnIncoming..) starts, trying to reset the timeout on the already-destroyed socket

Important:
Note that in this test (see "Reproduce" below) we're not transmitting large amounts of data on the connection - which is (I think) why f6a725e doesn't fix this issue. Put differently, I think it's a timing issue from the time one requests ends (and keepAlive timer is about to end) - and just at the (almost) same time - a new request comes in.

*I'm not 100% sure this is the correct flow - it's just my best educated-guess, based on some debugging

Reproduce:

Tested Versions:

  • v8.16.1
  • v12.18.0
  • v12.19.1
  • v12.20.0
  • v14.7.0

Naive Approach to a Fix
Assuming my debug analysis is correct, I would assume that there are 2 main ways to go about a fix for this:

  • either prevent the timeout event from firing on the socket if the socket is still in use
  • or, when ever the timeout event has fired, we then check if the socket is still in use, in which case we do nothing

Here's a naive code listing of the second approach:

// file: _http_server.js
function socketOnTimeout() {
  // "this" is the socket
  if (this.isInUse) {
    return;
  }  
  ...
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
http Issues or PRs related to the http subsystem.
Projects
None yet
Development

Successfully merging a pull request may close this issue.