Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf_hooks: fix scheduling regression #18051

Closed

Conversation

apapirovski
Copy link
Member

@apapirovski apapirovski commented Jan 9, 2018

Scheduling a PerformanceGCCallback should not keep the event loop alive but due to the recent switch to using the native SetImmediate method, it does. Go back to using uv_async_t and add a regression test.

The reason this was uncovered by the http2 binding is that it takes just long enough to do the GC on all the string constants that it creates that this race condition was triggered. (I finally uncovered this when the problem went away simply by commenting out HTTP_KNOWN_HEADERS(STRING_CONSTANT) in node::http2::Initialize.)

Refs: #18020
Fixes: #18047

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)

perf_hooks, src

@nodejs-github-bot nodejs-github-bot added the c++ Issues and PRs that require attention from people who are familiar with C++. label Jan 9, 2018
@apapirovski
Copy link
Member Author

apapirovski commented Jan 9, 2018

@jasnell
Copy link
Member

jasnell commented Jan 9, 2018

cc @addaleax

Scheduling a PerformanceGCCallback should not keep the
loop alive but due to the recent switch to using the
native SetImmediate method, it does. Go back to using
uv_async_t and add a regression test.
@apapirovski apapirovski force-pushed the fix-perf-hooks-gc-regression branch from 469ae0f to c5cc04f Compare January 9, 2018 03:52
@jasnell jasnell added the fast-track PRs that do not need to wait for 48 hours to land. label Jan 9, 2018
Copy link
Member

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I kind of feel like this would better handled eventually by allowing SetImmediate()s to be unrefed, but I guess that would require somebody to make that happen ;)

let didCall = false;
process.on('beforeExit', () => {
assert(!didCall);
didCall = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could just use common.mustCall() on the beforeExit handler instead of tracking via didCall, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, that would create an infinite loop of always doing more GC and restarting beforeExit.

@jasnell
Copy link
Member

jasnell commented Jan 9, 2018

@addaleax ... Yes, I agree that having the option of unref'ing the env->SetImmediate() would be ideal long term.

@jasnell
Copy link
Member

jasnell commented Jan 9, 2018

@MylesBorins
Copy link
Contributor

Failures on plinux

make[2]: Entering directory `/home/iojs/build/workspace/node-test-commit-plinux/nodes/ppcle-ubuntu1404/test/addons-napi/test_warning/build'
  CC(target) Release/obj.target/test_warning/test_warning.o
  SOLINK_MODULE(target) Release/obj.target/test_warning.node
  COPY Release/test_warning.node
  CC(target) Release/obj.target/test_warning2/test_warning2.o
  SOLINK_MODULE(target) Release/obj.target/test_warning2.node
  COPY Release/test_warning2.node
make[2]: Leaving directory `/home/iojs/build/workspace/node-test-commit-plinux/nodes/ppcle-ubuntu1404/test/addons-napi/test_warning/build'
gyp info ok 
touch test/addons-napi/.buildstamp
make[1]: Leaving directory `/home/iojs/build/workspace/node-test-commit-plinux/nodes/ppcle-ubuntu1404'
make: *** [run-ci] Error 2
Build step 'Execute shell' marked build as failure
Run condition [Always] enabling perform for step [[]]
TAP Reports Processing: START
Looking for TAP results report in workspace using pattern: *.tap
Did not find any matching files. Setting build result to FAILURE.
Checking ^not ok
Jenkins Text Finder: File set '*.tap' is empty
Sending e-mails to: michael_dawson@ca.ibm.com gib@uk.ibm.com
Notifying upstream projects of job completion
Finished: FAILURE

also linuxone

make[2]: Entering directory `/data/iojs/build/workspace/node-test-commit-linuxone/nodes/rhel72-s390x/test/addons-napi/test_warning/build'
  CC(target) Release/obj.target/test_warning/test_warning.o
  CC(target) Release/obj.target/test_warning2/test_warning2.o
  SOLINK_MODULE(target) Release/obj.target/test_warning.node
  SOLINK_MODULE(target) Release/obj.target/test_warning2.node
  COPY Release/test_warning.node
  COPY Release/test_warning2.node
make[2]: Leaving directory `/data/iojs/build/workspace/node-test-commit-linuxone/nodes/rhel72-s390x/test/addons-napi/test_warning/build'
gyp info ok 
touch test/addons-napi/.buildstamp
make[1]: Leaving directory `/data/iojs/build/workspace/node-test-commit-linuxone/nodes/rhel72-s390x'
make: *** [run-ci] Error 2
Build step 'Execute shell' marked build as failure
Run condition [Always] enabling perform for step [[]]
TAP Reports Processing: START
Looking for TAP results report in workspace using pattern: *.tap
Did not find any matching files. Setting build result to FAILURE.
Checking ^not ok
Jenkins Text Finder: File set 'test*' is empty
Sending e-mails to: michael_dawson@ca.ibm.com gib@uk.ibm.com
Notifying upstream projects of job completion
Finished: FAILURE

@jasnell
Copy link
Member

jasnell commented Jan 9, 2018

@apapirovski
Copy link
Member Author

I do not believe those are related to this. Also, there were two CIs earlier that were both fully successful. This didn't need a third run.

@jasnell
Copy link
Member

jasnell commented Jan 9, 2018

@jasnell
Copy link
Member

jasnell commented Jan 9, 2018

CI is looking good, I'll be getting this landed.

jasnell pushed a commit to jasnell/node that referenced this pull request Jan 9, 2018
Scheduling a PerformanceGCCallback should not keep the
loop alive but due to the recent switch to using the
native SetImmediate method, it does. Go back to using
uv_async_t and add a regression test.

PR-URL: nodejs#18051
Fixes: nodejs#18047
Refs: nodejs#18020
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
@jasnell
Copy link
Member

jasnell commented Jan 9, 2018

Landed in ececdd3

@jasnell jasnell closed this Jan 9, 2018
@apapirovski apapirovski deleted the fix-perf-hooks-gc-regression branch January 9, 2018 18:06
jasnell pushed a commit to jasnell/node that referenced this pull request Jan 9, 2018
Scheduling a PerformanceGCCallback should not keep the
loop alive but due to the recent switch to using the
native SetImmediate method, it does. Go back to using
uv_async_t and add a regression test.

PR-URL: nodejs#18051
Fixes: nodejs#18047
Refs: nodejs#18020
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
MylesBorins pushed a commit that referenced this pull request Jan 9, 2018
Scheduling a PerformanceGCCallback should not keep the
loop alive but due to the recent switch to using the
native SetImmediate method, it does. Go back to using
uv_async_t and add a regression test.

Backport-PR-URL: #18050
PR-URL: #18051
Fixes: #18047
Refs: #18020
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
@MylesBorins MylesBorins mentioned this pull request Jan 10, 2018
@apapirovski
Copy link
Member Author

@addaleax @jasnell FWIW, I've got an initial implementation of unrefed Immediates but I'm currently exploring a few other avenues & some refactoring of how immediates work. Should be able to open a PR in the next couple of days depending on how things go.

MylesBorins pushed a commit that referenced this pull request Jan 10, 2018
Scheduling a PerformanceGCCallback should not keep the
loop alive but due to the recent switch to using the
native SetImmediate method, it does. Go back to using
uv_async_t and add a regression test.

Backport-PR-URL: #18050
PR-URL: #18051
Fixes: #18047
Refs: #18020
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
MylesBorins pushed a commit that referenced this pull request May 22, 2018
Scheduling a PerformanceGCCallback should not keep the
loop alive but due to the recent switch to using the
native SetImmediate method, it does. Go back to using
uv_async_t and add a regression test.

PR-URL: #18051
Fixes: #18047
Refs: #18020
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
MylesBorins pushed a commit that referenced this pull request Jun 14, 2018
Scheduling a PerformanceGCCallback should not keep the
loop alive but due to the recent switch to using the
native SetImmediate method, it does. Go back to using
uv_async_t and add a regression test.

PR-URL: #18051
Fixes: #18047
Refs: #18020
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
@MylesBorins MylesBorins mentioned this pull request Jul 9, 2018
rvagg pushed a commit that referenced this pull request Aug 16, 2018
Scheduling a PerformanceGCCallback should not keep the
loop alive but due to the recent switch to using the
native SetImmediate method, it does. Go back to using
uv_async_t and add a regression test.

PR-URL: #18051
Fixes: #18047
Refs: #18020
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. fast-track PRs that do not need to wait for 48 hours to land.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test: investigate failing test-timers-unrefed-in-beforeexit
5 participants