-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate test-http2-createwritereq failure on Linux and AIX #17840
Comments
I guess I'll start by pinging the test author: @apapirovski |
Signal 11 = segmentation fault. I'm guessing AIX on CI is not configured to preserve core files, but let's ask. Ping @nodejs/build. |
And ping @nodejs/platform-aix for additional troubleshooting and/or suggestions on how to proceed. |
In the past, I've created branches at specific SHAs, pushed them to my fork (possibly to my |
I guess it might make sense to run a stress test against master to see if this is reproducible. Running it serially: https://ci.nodejs.org/job/node-stress-single-test/1574/nodes=aix61-ppc64/ Running it in parallel (96 processes): https://ci.nodejs.org/job/node-stress-single-test/1575/nodes=aix61-ppc64/ |
Ran a 2000 times locally with no luck on recreate. Either wait for @apapirovski to seek hints on possible causes, or gain login access to CI. /cc @mhdawson @gibfahn The test does not seem to use too much of memory (few bytes of data transported through http2 and validate it passes through properly with differing encodings) which rules out native memory constraints as a source of trouble (which was a known difference between CI and local AIX boxes that led to some failures.) |
Reproduced it in CI stress tests, so at least it's reproducible in that environment. Running serially, it got 31 failures in 9999 runs: 9999 OK: 9968 NOT OK: 31 TOTAL: 9999
+ '[' 31 '!=' 0 ']'
+ echo The test is flaky.
The test is flaky.
+ exit 1
Build step 'Conditional steps (multiple)' marked build as failure
Notifying upstream projects of job completion
Finished: FAILURE ...so running under load is not a precondition for failure. |
I don't think that test does anything particularly weird. If I had to guess, other http2 tests that write stuff might be flaky on that platform too. It's reasonably likely that the recent changes to http2 are responsible. I'll try to have a look when I have time but as @addaleax mentioned, a good place to start is running a couple of stress tests against master rewound to the two PRs above. |
thanks @apapirovski . @Trott - is it (artificially rewinding the two PRs and running stress on it) something which can be done through the existing build jobs or custom scripts? In terms of isolation: both the PRs contain considerable amount of code changes that makes it difficult for me to manually locate the crash reason. On the other hand, if we isolate the issue into one of these PRs, still the failure reason needs to come bottom up (starting with the crashing context). So I prefer (in my way of problem determination) performing core dump analysis directly in the CI machine. |
❌ = failure, ✅ = success ❌ Stress test rewound to e0e6b68 (right after #11718 landed): https://ci.nodejs.org/job/node-stress-single-test/1576/nodes=aix61-ppc64/ ✅ Stress test rewound to e3b44b6 (one commit before the above so that it's just the first of the two commits that landed in #11718): https://ci.nodejs.org/job/node-stress-single-test/1577/nodes=aix61-ppc64/ ✅ Stress test rewound to e554bc8 (right before #11718): https://ci.nodejs.org/job/node-stress-single-test/1579/nodes=aix61-ppc64/ |
@gireeshpunathil Stress tests aren't done running yet, but it sure looks like the issue is probably with e0e6b68. (Stress test at that commit failed, and the stress test at each of the two commits before it are both clean so far.... |
Stress tests are done and the evidence does point to e0e6b68.... |
thanks @Trott , for the info |
Btw, I could reproduce this locally on Linux, so it isn’t AIX-specific … it’s my patch that caused this so I’ll try to figure out what’s going on here :) |
thanks @addaleax , that is good news as in we don't need to access CI. How frequent is the crash? similar to that of AIX? |
It’s less than 1 in 1000 – I think that’s about the same rate. From some initial poking around, it seems like one of the |
Keep a local handle as a reference to the JS `Http2Session` object so that it will not be garbage collected when inside an `Http2Scope`, because the presence of the latter usually indicates that further actions on the session object are expected. Strictly speaking, storing the `session_handle_` as a property on the scope object is not necessary, but this is not very costly and makes the code more obviously correct. Fixes: nodejs#17840
Reproducible test case + fix @ #17863 removing the test label here because it’s not an issue with the test itself |
Keep a local handle as a reference to the JS `Http2Session` object so that it will not be garbage collected when inside an `Http2Scope`, because the presence of the latter usually indicates that further actions on the session object are expected. Strictly speaking, storing the `session_handle_` as a property on the scope object is not necessary, but this is not very costly and makes the code more obviously correct. Fixes: #17840 PR-URL: #17863 Fixes: #17840 Reviewed-By: Anatoli Papirovski <apapirovski@mac.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
PR-URL: nodejs#17863 Fixes: nodejs#17840 Reviewed-By: Anatoli Papirovski <apapirovski@mac.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Keep a local handle as a reference to the JS `Http2Session` object so that it will not be garbage collected when inside an `Http2Scope`, because the presence of the latter usually indicates that further actions on the session object are expected. Strictly speaking, storing the `session_handle_` as a property on the scope object is not necessary, but this is not very costly and makes the code more obviously correct. Fixes: nodejs#17840 PR-URL: nodejs#17863 Fixes: nodejs#17840 Reviewed-By: Anatoli Papirovski <apapirovski@mac.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Keep a local handle as a reference to the JS `Http2Session` object so that it will not be garbage collected when inside an `Http2Scope`, because the presence of the latter usually indicates that further actions on the session object are expected. Strictly speaking, storing the `session_handle_` as a property on the scope object is not necessary, but this is not very costly and makes the code more obviously correct. Fixes: #17840 Backport-PR-URL: #18050 PR-URL: #17863 Fixes: #17840 Reviewed-By: Anatoli Papirovski <apapirovski@mac.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Keep a local handle as a reference to the JS `Http2Session` object so that it will not be garbage collected when inside an `Http2Scope`, because the presence of the latter usually indicates that further actions on the session object are expected. Strictly speaking, storing the `session_handle_` as a property on the scope object is not necessary, but this is not very costly and makes the code more obviously correct. Fixes: #17840 Backport-PR-URL: #18050 PR-URL: #17863 Fixes: #17840 Reviewed-By: Anatoli Papirovski <apapirovski@mac.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Backport-PR-URL: nodejs#18050 PR-URL: nodejs#17863 Fixes: nodejs#17840 Reviewed-By: Anatoli Papirovski <apapirovski@mac.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Keep a local handle as a reference to the JS `Http2Session` object so that it will not be garbage collected when inside an `Http2Scope`, because the presence of the latter usually indicates that further actions on the session object are expected. Strictly speaking, storing the `session_handle_` as a property on the scope object is not necessary, but this is not very costly and makes the code more obviously correct. Fixes: nodejs#17840 Backport-PR-URL: nodejs#18050 PR-URL: nodejs#17863 Fixes: nodejs#17840 Reviewed-By: Anatoli Papirovski <apapirovski@mac.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Keep a local handle as a reference to the JS `Http2Session` object so that it will not be garbage collected when inside an `Http2Scope`, because the presence of the latter usually indicates that further actions on the session object are expected. Strictly speaking, storing the `session_handle_` as a property on the scope object is not necessary, but this is not very costly and makes the code more obviously correct. Fixes: #17840 Backport-PR-URL: #18050 Backport-PR-URL: #20456 PR-URL: #17863 Fixes: #17840 Reviewed-By: Anatoli Papirovski <apapirovski@mac.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
https://ci.nodejs.org/job/node-test-commit-aix/11376/nodes=aix61-ppc64/console
The text was updated successfully, but these errors were encountered: