-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: make test-cluster-disconnect-leak reliable #4736
Conversation
Previously, test-cluster-disconnect-leak had two issues: * Magic numbers: How many times to spawn a worker was determined through empirical experimentation. This means that as new platforms and new CPU/RAM configurations are tested, the magic numbers require more and more refinement. This brings us to... * Non-determinism: The test *seems* to fail all the time when the bug it tests for is present, but it's really a judgment based on sampling. "Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try 16..." This revised version of the test takes a different approach. The fix for the bug that the test was written for means that the `disconnect` event will fire reliably for a single worker. So we check for that and the test still fails when the fix is not in the code base and succeeds when it is. Advantages of this approach include: * The test runs much faster. * The test now works on Windows. The previous version skipped Windows. * The test should be reliable on any new platform regardless of CPU and RAM. Ref: nodejs#4674
a245cbf
to
0fe44bc
Compare
When I comment out the three-line fix that the test was originally written for, everything fails as expected: https://ci.nodejs.org/job/node-test-commit/1799/ When I leave the fix in but use this test, everything succeeds as expected: https://ci.nodejs.org/job/node-test-commit/1801/ (Windows failure is buildbot and unrelated, you can see this test passing on the three Windows variants as test number 237 in matrix host 1, buildbot failed on matrix 4 where this test wouldn't have run anyway) Similarly, on my machine, when I run the test with Node 5.4.0, it fails (because that version has the bug) and when I run the test with Node 5.4.1, it passes. |
If this lands, a next improvement might be to move this to |
That last CI had a buildbot failure, so a re-run just to be sure: https://ci.nodejs.org/job/node-test-commit/1804/ And, it's all green! \o/ |
LGTM |
return; | ||
} | ||
|
||
var server = net.createServer(); | ||
const server = net.createServer(); | ||
|
||
server.listen(common.PORT, function() { | ||
process.send('listening'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just server.listen(common.PORT);
should work?
LGTM with 2 suggestions. Thanks for improving both tests! |
LGTM. Could you move it to parallel as well? |
LGTM. This is far simpler. |
LGTM |
Made changes per the suggestions from @santigimeno CI: https://ci.nodejs.org/job/node-test-commit/1829/ All green! \o/ |
Previously, test-cluster-disconnect-leak had two issues: * Magic numbers: How many times to spawn a worker was determined through empirical experimentation. This means that as new platforms and new CPU/RAM configurations are tested, the magic numbers require more and more refinement. This brings us to... * Non-determinism: The test *seems* to fail all the time when the bug it tests for is present, but it's really a judgment based on sampling. "Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try 16..." This revised version of the test takes a different approach. The fix for the bug that the test was written for means that the `disconnect` event will fire reliably for a single worker. So we check for that and the test still fails when the fix is not in the code base and succeeds when it is. Advantages of this approach include: * The test runs much faster. * The test now works on Windows. The previous version skipped Windows. * The test should be reliable on any new platform regardless of CPU and RAM. Ref: #4674 PR-URL: #4736 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
Landed in d5c525d |
Previously, test-cluster-disconnect-leak had two issues: * Magic numbers: How many times to spawn a worker was determined through empirical experimentation. This means that as new platforms and new CPU/RAM configurations are tested, the magic numbers require more and more refinement. This brings us to... * Non-determinism: The test *seems* to fail all the time when the bug it tests for is present, but it's really a judgment based on sampling. "Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try 16..." This revised version of the test takes a different approach. The fix for the bug that the test was written for means that the `disconnect` event will fire reliably for a single worker. So we check for that and the test still fails when the fix is not in the code base and succeeds when it is. Advantages of this approach include: * The test runs much faster. * The test now works on Windows. The previous version skipped Windows. * The test should be reliable on any new platform regardless of CPU and RAM. Ref: #4674 PR-URL: #4736 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
Two cluster tests have recently changed so that they are no longer resource intensive. Move them back to parallel. Ref: nodejs#4736 Ref: nodejs#4739
Previously, test-cluster-disconnect-leak had two issues: * Magic numbers: How many times to spawn a worker was determined through empirical experimentation. This means that as new platforms and new CPU/RAM configurations are tested, the magic numbers require more and more refinement. This brings us to... * Non-determinism: The test *seems* to fail all the time when the bug it tests for is present, but it's really a judgment based on sampling. "Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try 16..." This revised version of the test takes a different approach. The fix for the bug that the test was written for means that the `disconnect` event will fire reliably for a single worker. So we check for that and the test still fails when the fix is not in the code base and succeeds when it is. Advantages of this approach include: * The test runs much faster. * The test now works on Windows. The previous version skipped Windows. * The test should be reliable on any new platform regardless of CPU and RAM. Ref: #4674 PR-URL: #4736 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
Two cluster tests have recently changed so that they are no longer resource intensive. Move them back to parallel. Ref: nodejs#4736 Ref: nodejs#4739 PR-URL: nodejs#4774 Reviewed-By: Johan Bergström <bugs@bergstroem.nu> Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Previously, test-cluster-disconnect-leak had two issues: * Magic numbers: How many times to spawn a worker was determined through empirical experimentation. This means that as new platforms and new CPU/RAM configurations are tested, the magic numbers require more and more refinement. This brings us to... * Non-determinism: The test *seems* to fail all the time when the bug it tests for is present, but it's really a judgment based on sampling. "Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try 16..." This revised version of the test takes a different approach. The fix for the bug that the test was written for means that the `disconnect` event will fire reliably for a single worker. So we check for that and the test still fails when the fix is not in the code base and succeeds when it is. Advantages of this approach include: * The test runs much faster. * The test now works on Windows. The previous version skipped Windows. * The test should be reliable on any new platform regardless of CPU and RAM. Ref: #4674 PR-URL: #4736 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
Previously, test-cluster-disconnect-leak had two issues: * Magic numbers: How many times to spawn a worker was determined through empirical experimentation. This means that as new platforms and new CPU/RAM configurations are tested, the magic numbers require more and more refinement. This brings us to... * Non-determinism: The test *seems* to fail all the time when the bug it tests for is present, but it's really a judgment based on sampling. "Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try 16..." This revised version of the test takes a different approach. The fix for the bug that the test was written for means that the `disconnect` event will fire reliably for a single worker. So we check for that and the test still fails when the fix is not in the code base and succeeds when it is. Advantages of this approach include: * The test runs much faster. * The test now works on Windows. The previous version skipped Windows. * The test should be reliable on any new platform regardless of CPU and RAM. Ref: #4674 PR-URL: #4736 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
Previously, test-cluster-disconnect-leak had two issues: * Magic numbers: How many times to spawn a worker was determined through empirical experimentation. This means that as new platforms and new CPU/RAM configurations are tested, the magic numbers require more and more refinement. This brings us to... * Non-determinism: The test *seems* to fail all the time when the bug it tests for is present, but it's really a judgment based on sampling. "Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try 16..." This revised version of the test takes a different approach. The fix for the bug that the test was written for means that the `disconnect` event will fire reliably for a single worker. So we check for that and the test still fails when the fix is not in the code base and succeeds when it is. Advantages of this approach include: * The test runs much faster. * The test now works on Windows. The previous version skipped Windows. * The test should be reliable on any new platform regardless of CPU and RAM. Ref: nodejs#4674 PR-URL: nodejs#4736 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
Previously, test-cluster-disconnect-leak had two issues: * Magic numbers: How many times to spawn a worker was determined through empirical experimentation. This means that as new platforms and new CPU/RAM configurations are tested, the magic numbers require more and more refinement. This brings us to... * Non-determinism: The test *seems* to fail all the time when the bug it tests for is present, but it's really a judgment based on sampling. "Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try 16..." This revised version of the test takes a different approach. The fix for the bug that the test was written for means that the `disconnect` event will fire reliably for a single worker. So we check for that and the test still fails when the fix is not in the code base and succeeds when it is. Advantages of this approach include: * The test runs much faster. * The test now works on Windows. The previous version skipped Windows. * The test should be reliable on any new platform regardless of CPU and RAM. Ref: nodejs#4674 PR-URL: nodejs#4736 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
Previously, test-cluster-disconnect-leak had two issues: * Magic numbers: How many times to spawn a worker was determined through empirical experimentation. This means that as new platforms and new CPU/RAM configurations are tested, the magic numbers require more and more refinement. This brings us to... * Non-determinism: The test *seems* to fail all the time when the bug it tests for is present, but it's really a judgment based on sampling. "Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try 16..." This revised version of the test takes a different approach. The fix for the bug that the test was written for means that the `disconnect` event will fire reliably for a single worker. So we check for that and the test still fails when the fix is not in the code base and succeeds when it is. Advantages of this approach include: * The test runs much faster. * The test now works on Windows. The previous version skipped Windows. * The test should be reliable on any new platform regardless of CPU and RAM. Ref: nodejs#4674 PR-URL: nodejs#4736 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
Previously, test-cluster-disconnect-leak had two issues: * Magic numbers: How many times to spawn a worker was determined through empirical experimentation. This means that as new platforms and new CPU/RAM configurations are tested, the magic numbers require more and more refinement. This brings us to... * Non-determinism: The test *seems* to fail all the time when the bug it tests for is present, but it's really a judgment based on sampling. "Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try 16..." This revised version of the test takes a different approach. The fix for the bug that the test was written for means that the `disconnect` event will fire reliably for a single worker. So we check for that and the test still fails when the fix is not in the code base and succeeds when it is. Advantages of this approach include: * The test runs much faster. * The test now works on Windows. The previous version skipped Windows. * The test should be reliable on any new platform regardless of CPU and RAM. Ref: nodejs#4674 PR-URL: nodejs#4736 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
Two cluster tests have recently changed so that they are no longer resource intensive. Move them back to parallel. Ref: nodejs#4736 Ref: nodejs#4739 PR-URL: nodejs#4774 Reviewed-By: Johan Bergström <bugs@bergstroem.nu> Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Previously, test-cluster-disconnect-leak had two issues:
empirical experimentation. This means that as new platforms and new
CPU/RAM configurations are tested, the magic numbers require more
and more refinement. This brings us to...
it tests for is present, but it's really a judgment based on sampling.
"Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try
16..."
This revised version of the test takes a different approach. The fix
for the bug that the test was written for means that the
disconnect
event will fire reliably for a single worker. So we check for that and
the test still fails when the fix is not in the code base and succeeds
when it is.
Advantages of this approach include:
RAM.
Ref: #4674
cc @santigimeno @iwuzhere