Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate flaky test-tls-socket-close on macOS #13184

Closed
Trott opened this issue May 24, 2017 · 6 comments
Closed

Investigate flaky test-tls-socket-close on macOS #13184

Trott opened this issue May 24, 2017 · 6 comments
Labels
macos Issues and PRs related to the macOS platform / OSX. test Issues and PRs related to the tests. tls Issues and PRs related to the tls subsystem.

Comments

@Trott
Copy link
Member

Trott commented May 24, 2017

  • Version: v8.0.0-pre
  • Platform: macOS
  • Subsystem: test

From https://ci.nodejs.org/job/node-test-commit-osx/9995/nodes=osx1010/console:

not ok 1205 parallel/test-tls-socket-close
  ---
  duration_ms: 0.243
  severity: fail
  stack: |-
    events.js:182
          throw er; // Unhandled 'error' event
          ^
    
    Error: read ECONNRESET
        at exports._errnoException (util.js:1026:11)
        at TLSWrap.onread (net.js:607:25)

I was able to replicate this with:

$ tools/test.py --repeat 64 -j 16 test/parallel/test-tls-socket-close.js 
=== release test-tls-socket-close ===                    
Path: parallel/test-tls-socket-close
events.js:182
      throw er; // Unhandled 'error' event
      ^

Error: read ECONNRESET
    at exports._errnoException (util.js:1026:11)
    at TLSWrap.onread (net.js:607:25)
Command: out/Release/node /Users/trott/io.js/test/parallel/test-tls-socket-close.js
[00:03|% 100|+  63|-   1]: Done 
$

So it's possible the solution is to move it to sequential. But it's also possible that this is a race condition somewhere in the code.

@Trott Trott added macos Issues and PRs related to the macOS platform / OSX. test Issues and PRs related to the tests. tls Issues and PRs related to the tls subsystem. labels May 24, 2017
@sebastianplesciuc
Copy link

Just a very uninformed guess. I was able to replicate the issue using the command you mention. If I move this code https://github.com/nodejs/node/blob/master/test/parallel/test-tls-socket-close.js#L27-L58 into the tls.createServer() callback I am no longer able to replicate the issue.

@Trott
Copy link
Member Author

Trott commented May 24, 2017

@sebastianplesciuc Thanks for trying to figure this one out! Unfortunately, that change invalidates the test as Node.js 7.7.3 no longer segfaults on that test if that change is made.

@sebastianplesciuc
Copy link

Running with export NODE_DEBUG=net yields the following:

NET 24869: setupListenHandle null 0 4 0 undefined
NET 24869: setupListenHandle: create a handle
NET 24869: bind to ::
NET 24869: pipe false undefined
NET 24869: connect: find host localhost
NET 24869: connect: dns options { family: undefined, hints: 1024 }
NET 24869: _read
NET 24869: _read wait for connection
NET 24869: afterConnect
NET 24869: _read
NET 24869: Socket._read readStart
NET 24869: onconnection
NET 24869: _read
NET 24869: Socket._read readStart
NET 24869: _read
NET 24869: afterWrite 0
NET 24869: afterWrite call cb
NET 24869: _onTimeout
NET 24869: destroy
NET 24869: close
NET 24869: close handle
NET 24869: has server
NET 24869: SERVER _emitCloseIfDrained
NET 24869: SERVER handle? true   connections? 0
NET 24869: onread -54
NET 24869: destroy
NET 24869: close
NET 24869: close handle
events.js:182
      throw er; // Unhandled 'error' event
      ^

Error: read ECONNRESET
    at exports._errnoException (util.js:1026:11)
    at TLSWrap.onread (net.js:607:25)

I don't understand it yet, but I thought it might help someone who can.

@Trott
Copy link
Member Author

Trott commented Jun 7, 2017

Still a thing:

https://ci.nodejs.org/job/node-test-commit-osx/10336/nodes=osx1010/console

not ok 1276 parallel/test-tls-socket-close
  ---
  duration_ms: 0.164
  severity: fail
  stack: |-
    events.js:182
          throw er; // Unhandled 'error' event
          ^
    
    Error: read ECONNRESET
        at exports._errnoException (util.js:1012:11)
        at TLSWrap.onread (net.js:607:25)
  ...

@Trott
Copy link
Member Author

Trott commented Jun 7, 2017

Adding an error listener that ignores ECONNRESET makes the test reliable while still seg-faulting as expected on Node.js v7.7.3. Race condition, I suppose. Quite possibly unavoidable (if we want to keep the segfault on relevant versions of Node.js, which we do because it's the whole point of the test). PR coming momentarily.

Trott added a commit to Trott/io.js that referenced this issue Jun 7, 2017
Add error listener to ignore `ECONNRESET`. Makes test reliable while it
still segfaults (as expected) on Node.js 7.7.3. It might not be possible
to eliminate the probable race causing `ECONNRESET` without also
eliminating the required segfault-inducing part of the test. (Or maybe
it's totally possible. If you figure it out, hey cool, submit a pull
request.)

Fixes: nodejs#13184
@Trott
Copy link
Member Author

Trott commented Jun 7, 2017

PR to fix: #13529

@Trott Trott closed this as completed in dde4f0f Jun 10, 2017
addaleax pushed a commit that referenced this issue Jun 10, 2017
Add error listener to ignore `ECONNRESET`. Makes test reliable while it
still segfaults (as expected) on Node.js 7.7.3. It might not be possible
to eliminate the probable race causing `ECONNRESET` without also
eliminating the required segfault-inducing part of the test. (Or maybe
it's totally possible. If you figure it out, hey cool, submit a pull
request.)

PR-URL: #13529
Fixes: #13184
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
MylesBorins pushed a commit that referenced this issue Jul 17, 2017
Add error listener to ignore `ECONNRESET`. Makes test reliable while it
still segfaults (as expected) on Node.js 7.7.3. It might not be possible
to eliminate the probable race causing `ECONNRESET` without also
eliminating the required segfault-inducing part of the test. (Or maybe
it's totally possible. If you figure it out, hey cool, submit a pull
request.)

PR-URL: #13529
Fixes: #13184
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
sam-github added a commit to sam-github/node that referenced this issue Jan 25, 2019
This test has a dependency on the order in which the TCP connection is
made, and TLS server handshake completes. It assumes those server side
events occur before the client side write callback, which is not
guaranteed by the TLS API. It usually passes with TLS1.3, but TLS1.3
didn't exist at the time the bug existed.

Pin the test to TLS1.2, since the test shouldn't be changed in a way that
doesn't trigger a segfault in 7.7.3:
- nodejs#13184 (comment)
sam-github added a commit to sam-github/node that referenced this issue Jan 31, 2019
This test has a dependency on the order in which the TCP connection is
made, and TLS server handshake completes. It assumes those server side
events occur before the client side write callback, which is not
guaranteed by the TLS API. It usually passes with TLS1.3, but TLS1.3
didn't exist at the time the bug existed.

Pin the test to TLS1.2, since the test shouldn't be changed in a way that
doesn't trigger a segfault in 7.7.3:
- nodejs#13184 (comment)
sam-github added a commit to sam-github/node that referenced this issue Feb 5, 2019
This test has a dependency on the order in which the TCP connection is
made, and TLS server handshake completes. It assumes those server side
events occur before the client side write callback, which is not
guaranteed by the TLS API. It usually passes with TLS1.3, but TLS1.3
didn't exist at the time the bug existed.

Pin the test to TLS1.2, since the test shouldn't be changed in a way that
doesn't trigger a segfault in 7.7.3:
- nodejs#13184 (comment)
sam-github added a commit that referenced this issue Feb 6, 2019
This test has a dependency on the order in which the TCP connection is
made, and TLS server handshake completes. It assumes those server side
events occur before the client side write callback, which is not
guaranteed by the TLS API. It usually passes with TLS1.3, but TLS1.3
didn't exist at the time the bug existed.

Pin the test to TLS1.2, since the test shouldn't be changed in a way that
doesn't trigger a segfault in 7.7.3:
- #13184 (comment)

PR-URL: #25508
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
addaleax pushed a commit that referenced this issue Feb 8, 2019
This test has a dependency on the order in which the TCP connection is
made, and TLS server handshake completes. It assumes those server side
events occur before the client side write callback, which is not
guaranteed by the TLS API. It usually passes with TLS1.3, but TLS1.3
didn't exist at the time the bug existed.

Pin the test to TLS1.2, since the test shouldn't be changed in a way that
doesn't trigger a segfault in 7.7.3:
- #13184 (comment)

PR-URL: #25508
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
sam-github added a commit to sam-github/node that referenced this issue Feb 28, 2019
This test has a dependency on the order in which the TCP connection is
made, and TLS server handshake completes. It assumes those server side
events occur before the client side write callback, which is not
guaranteed by the TLS API. It usually passes with TLS1.3, but TLS1.3
didn't exist at the time the bug existed.

Pin the test to TLS1.2, since the test shouldn't be changed in a way that
doesn't trigger a segfault in 7.7.3:
- nodejs#13184 (comment)
sam-github added a commit to sam-github/node that referenced this issue Feb 28, 2019
This test has a dependency on the order in which the TCP connection is
made, and TLS server handshake completes. It assumes those server side
events occur before the client side write callback, which is not
guaranteed by the TLS API. It usually passes with TLS1.3, but TLS1.3
didn't exist at the time the bug existed.

Pin the test to TLS1.2, since the test shouldn't be changed in a way that
doesn't trigger a segfault in 7.7.3:
- nodejs#13184 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
macos Issues and PRs related to the macOS platform / OSX. test Issues and PRs related to the tests. tls Issues and PRs related to the tls subsystem.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants