Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: fix flaky tls-inception #5450

Closed
wants to merge 1 commit into from

Conversation

gibfahn
Copy link
Member

@gibfahn gibfahn commented Feb 26, 2016

PR to address #5386 .

Affected core subsystem(s)

test

Description of change

The test now catches ECONNRESET errors thrown in parallel/test-tls-inception on windows. It also replaces removes socket.destroy() with socket.end(), as socket.destroy() also seemed to be causing occasional failures on windows.

@silverwind silverwind added the test Issues and PRs related to the tests. label Feb 26, 2016
});

socket.on('error', function(err) {
console.error(err);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe check that the error is ECONNRESET

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that seems reasonable to me

@mscdex mscdex added the tls Issues and PRs related to the tls subsystem. label Feb 26, 2016
@gibfahn
Copy link
Member Author

gibfahn commented Feb 29, 2016

@santigimeno removing the destination listener doesn't seem to cause any problems on Ubuntu 14.04, but without the error handler I'm still getting the ECONNRESET errors on Win7.

@santigimeno
Copy link
Member

@gibm Yes, I was only talking about removing the dest.on('end') but keeping the error listener. Does it work that way?.

@gibfahn
Copy link
Member Author

gibfahn commented Feb 29, 2016

@santigimeno Yep, just running the test to make sure it's all working, then I'll update this.

@gibfahn
Copy link
Member Author

gibfahn commented Feb 29, 2016

PR updated as per @santigimeno 's suggestions. Let me know if there's anything else that would be worth adding.

socket.on('error', function(err) {
console.error(err);
if (err.code !== 'ECONNRESET')
throw err;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe restricting the error handling to windows as in https://github.com/nodejs/node/blob/master/test/parallel/test-child-process-fork-regr-gh-2847.js#L18-L23
Also, probably we can remove the console.error as the error info will be available if the exception is thrown

@santigimeno
Copy link
Member

A couple of comments, but LGTM

@gibfahn
Copy link
Member Author

gibfahn commented Feb 29, 2016

@santigimeno updated, tests still looking good.

@Trott
Copy link
Member

Trott commented Mar 1, 2016

Seems like it would be good to have an expert look over even small changes to tls tests. So... /cc @nodejs/crypto

@gibfahn
Copy link
Member Author

gibfahn commented Mar 15, 2016

Bump!

If anyone else has any problems or suggestions let me know...

@jasnell
Copy link
Member

jasnell commented Mar 15, 2016

@shigeki @indutny ... could either of you please give this a quick review?

@shigeki
Copy link
Contributor

shigeki commented Mar 15, 2016

@jasnell I'm afraid that I don't have Windows to check this PR now. It was crashed several months ago and needs several days to get restored.

@gibfahn
Copy link
Member Author

gibfahn commented Mar 15, 2016

Is it possible to run the node-test-commit-windows job for this change?

@jasnell
Copy link
Member

jasnell commented Mar 15, 2016

@gibfahn gibfahn force-pushed the tls-inception-fix branch from 0a299b5 to 60fccc6 Compare March 16, 2016 11:17
@gibfahn
Copy link
Member Author

gibfahn commented Mar 16, 2016

So looking at the test results, it looks like this change causes the test to fail on OSX and SmartOS with ECONNRESET. I can replicate this (at least for OSX) myself.

It looks like the test failure is due to the removal of the dest.on('end') listener. Adding this back in should fix the OSX and SmartOS failures. The Windows test is now passing which is great.

@gibfahn gibfahn force-pushed the tls-inception-fix branch from 60fccc6 to cf17a69 Compare March 16, 2016 11:34
The test now catches ECONNRESET errors thrown in
parallel/test-tls-inception on windows.
@gibfahn gibfahn force-pushed the tls-inception-fix branch from cf17a69 to 9f20d29 Compare March 16, 2016 11:55
@gibfahn
Copy link
Member Author

gibfahn commented Mar 16, 2016

Okay, this is interesting.

This works on OSX and fails on Windows with EAADRINUSE:

dest.on('end', function() {
  socket.destroy();
});

This works on Windows and fails on OSX (and possibly SmartOS) with ECONNRESET:

dest.on('end', function() {
  socket.end();
});

Removing this listener entirely also works on Windows (but not OSX).

@santigimeno would you have any ideas about why?

@santigimeno
Copy link
Member

@gibm IIRC, the problem I tried to fix in 3b94991 was that sometimes, in OS X, the client socket from the a server emitted an ECONNRESET because of a race condition caused by both ends trying to close the connection at the same time. I tried to improve the situation by calling socket.destroy() asap, but probably the race condition still exists. Maybe there's no way around it and the proper solution is just to have a listener for ECONNRESET error in every platform.

@gibfahn
Copy link
Member Author

gibfahn commented Mar 16, 2016

@santigimeno okay, that makes sense. So we can:

  1. keep the original listener (the one with socket.destroy()), and handle the EADDRINUSE in Windows
  2. (as you suggest) remove the whole thing and handle ECONNRESET errors for all platforms (or Mac + Windows + SmartOS).

I don't really know enough about this test to have an opinion on which is better. I'm happy to look at this in more depth, but if anyone else has any more information that'd be helpful.

@indutny I noticed you originally added this handler, do you have any opinions on this?

@jasnell
Copy link
Member

jasnell commented Mar 21, 2016

What's the status on this one?

@gibfahn
Copy link
Member Author

gibfahn commented Mar 23, 2016

@jasnell I think it's on hold until someone with more knowledge of this test can take a look, or I get time to go through and decide what makes most sense.

I think it will be one of the options I mentioned above.

@jasnell
Copy link
Member

jasnell commented Mar 23, 2016

/cc @Trott

@Trott
Copy link
Member

Trott commented Mar 23, 2016

/cc @nodejs/platform-windows

@Trott
Copy link
Member

Trott commented Mar 23, 2016

LGTM but I'd really like to see at least one Windows or crypto person give it the seal of approval

@gibfahn
Copy link
Member Author

gibfahn commented Mar 24, 2016

@Trott which change looks good to you? Handling the EADDRINUSE in Windows
or removing the listener and handling ECONNRESET errors for all platforms (or specifically Mac + Windows + SmartOS).

The PR as-is now fails on OSX and SmartOS.

@Trott
Copy link
Member

Trott commented Mar 24, 2016

@gibm Never mind, I misunderstood the state of affairs. Sorry for the noise.

@jasnell
Copy link
Member

jasnell commented Apr 21, 2016

Ping @nodejs/platform-windows @nodejs/testing

@Trott
Copy link
Member

Trott commented Apr 22, 2016

Is the difference in behavior on the different platforms just The Way It Is On Those Platforms? Or is it an issue in Node.js (or one of its dependencies like libuv)? Or is that not known at this point?

We certainly have plenty of instances of tests where we check if it's Windows and do one thing, otherwise do something else. So we could go that route here, if people are pretty sure the issue is necessarily different behavior on the different platforms and isn't a bug in Node itself.

/cc @nodejs/crypto

@estliberitas estliberitas force-pushed the master branch 2 times, most recently from 7da4fd4 to c7066fb Compare April 26, 2016 05:23
@MylesBorins
Copy link
Contributor

one more ping to @nodejs/crypto and @nodejs/platform-windows

@joaocgreis
Copy link
Member

I haven't been able to continue to investigate this one yet, but it's still in my queue.

The best course of action here would be to fix this test with whatever hack it need to test what it's meant to test: the TLS inception, the real goal of the test seems to be the assert at https://github.com/nodejs/node/blob/cdcfeb7/test/parallel/test-tls-inception.js#L64 . However, if I'm not missing anything, none of the destroy or end signals should be handled, because pipe should take care of that. So, the test in its current form minus the end handler should be copied to know issues.

@Trott
Copy link
Member

Trott commented Jul 8, 2016

Disregarding the test-on-Windows part of this, can someone in @nodejs/crypto look at this change to this test and comment on whether the change is sensible in that the test still checks that Node.js is behaving correctly? Or does the change here merely cover up buggy behavior and the test should be failing? (And if it's covering up buggy behavior, any opinion as to if it is likely to be a bug in Node.js, libuv, Windows, something else?)

@Trott
Copy link
Member

Trott commented Sep 7, 2016

@joaocgreis Unfortunately, I don't think we can copy this over to known_issues because tests in that directory are expected to fail 100% of the time and this is a flaky test that only fails intermittently. :-/

@rvagg rvagg force-pushed the master branch 2 times, most recently from c133999 to 83c7a88 Compare October 18, 2016 17:01
@jasnell jasnell added the stalled Issues and PRs that are stalled. label Jan 6, 2017
@jasnell
Copy link
Member

jasnell commented Jan 6, 2017

Ping @gibfahn ... status on this one?

@gibfahn
Copy link
Member Author

gibfahn commented Jan 7, 2017

To be honest I was shooting in the dark with this one, if someone from @nodejs/platform-windows (maybe @digitalinfinity) could confirm that this makes sense then we can go ahead with this (or not...)

@joaocgreis
Copy link
Member

I've been hesitant on this one because I think there's a deeper issue here. At the very least we have a platform discrepancy. The test is testing what it's supposed to test, so I'm +1 with working around the error here (perhaps we should open another issue about the test failing as is now, even without the end/destroy line it should pass on all platforms).

@gibfahn this PR is changing destroy to end, does this work without this change? If the test works on all platforms just by ignoring the error on Windows, I'm +1 for landing this.

@gibfahn
Copy link
Member Author

gibfahn commented Jan 7, 2017

@joaocgreis Looking at the original issue (#5386) it looks like the ECONNRESET only happens when you change destroy->end, otherwise you get the assertionError instead. So I don't think ignoring the error on windows will help by itself.

I don't think this should land as is, it's really just covering up an existing issue. If we think the test is exposing a real node issue on windows, we shouldn't just try to ignore errors and pretend that everything's fine.

@fhinkel
Copy link
Member

fhinkel commented Mar 26, 2017

@gibfahn is it OK to close this if you say it shouldn't land as is?

@gibfahn
Copy link
Member Author

gibfahn commented Mar 26, 2017

@fhinkel yep, I'll close.

@gibfahn gibfahn closed this Mar 26, 2017
@gibfahn gibfahn deleted the tls-inception-fix branch March 26, 2017 12:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stalled Issues and PRs that are stalled. test Issues and PRs related to the tests. tls Issues and PRs related to the tls subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants