Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate flaky sequential/test-performance #23291

Closed
Trott opened this issue Oct 6, 2018 · 25 comments
Closed

Investigate flaky sequential/test-performance #23291

Trott opened this issue Oct 6, 2018 · 25 comments
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI. performance Issues and PRs related to the performance of Node.js. test Issues and PRs related to the tests.

Comments

@Trott
Copy link
Member

Trott commented Oct 6, 2018

https://ci.nodejs.org/job/node-test-binary-arm/3710/RUN_SUBSET=4,label=pi1-docker/console

00:40:41 not ok 330 sequential/test-performance
00:40:41   ---
00:40:41   duration_ms: 5.141
00:40:41   severity: fail
00:40:41   exitcode: 1
00:40:41   stack: |-
00:40:41     { name: 'node',
00:40:41       entryType: 'node',
00:40:41       startTime: 0,
00:40:41       duration: { around: 4609.206676006317 },
00:40:41       nodeStart: { around: 0 },
00:40:41       v8Start: { around: 0 },
00:40:41       bootstrapComplete: { around: 4517.247398018837 },
00:40:41       environment: { around: 0 },
00:40:41       loopStart: -1,
00:40:41       loopExit: -1 }
00:40:41     assert.js:349
00:40:41         throw err;
00:40:41         ^
00:40:41     
00:40:41     AssertionError [ERR_ASSERTION]: bootstrapComplete: 1233.9948890209198 >= 1111
00:40:41         at checkNodeTiming (/home/iojs/build/workspace/node-test-binary-arm/test/sequential/test-performance.js:92:7)
00:40:41         at Object.<anonymous> (/home/iojs/build/workspace/node-test-binary-arm/test/sequential/test-performance.js:104:1)
00:40:41         at Module._compile (internal/modules/cjs/loader.js:703:30)
00:40:41         at Object.Module._extensions..js (internal/modules/cjs/loader.js:714:10)
00:40:41         at Module.load (internal/modules/cjs/loader.js:613:32)
00:40:41         at tryModuleLoad (internal/modules/cjs/loader.js:552:12)
00:40:41         at Function.Module._load (internal/modules/cjs/loader.js:544:3)
00:40:41         at Function.Module.runMain (internal/modules/cjs/loader.js:756:12)
00:40:41         at startup (internal/bootstrap/node.js:303:19)
00:40:41         at bootstrapNodeJSCore (internal/bootstrap/node.js:865:3)
00:40:41   ...

Looks like a case of testing elapsed time (as a proxy for performance?). Is this a case of the test being unreliable or is it a case of the test flagging something we care about? Test doesn't have a comment explaining itself but it was added in 67269fd / #14680 and there's context there.

/ping @jasnell

@Trott Trott added test Issues and PRs related to the tests. performance Issues and PRs related to the performance of Node.js. labels Oct 6, 2018
@Trott
Copy link
Member Author

Trott commented Oct 6, 2018

@nodejs/testing @nodejs/performance

@joyeecheung
Copy link
Member

(The performance team is for changes related to /benchmark, not the perf hooks)

@Trott
Copy link
Member Author

Trott commented Nov 4, 2018

https://ci.nodejs.org/job/node-test-commit-smartos/21373/nodes=smartos17-64/console

00:52:03 not ok 2418 sequential/test-performance
00:52:03   ---
00:52:03   duration_ms: 2.224
00:52:03   severity: fail
00:52:03   exitcode: 1
00:52:03   stack: |-
00:52:03     { name: 'node',
00:52:03       entryType: 'node',
00:52:03       startTime: 0,
00:52:03       duration: { around: 1893.748501777649 },
00:52:03       nodeStart: { around: 0 },
00:52:03       v8Start: { around: 0 },
00:52:03       bootstrapComplete: { around: 1887.9314036369324 },
00:52:03       environment: { around: 0 },
00:52:03       loopStart: -1,
00:52:03       loopExit: -1 }
00:52:03     assert.js:351
00:52:03         throw err;
00:52:03         ^
00:52:03     
00:52:03     AssertionError [ERR_ASSERTION]: bootstrapComplete: 1796.7190556526184 >= 1111
00:52:03         at checkNodeTiming (/home/iojs/build/workspace/node-test-commit-smartos/nodes/smartos17-64/test/sequential/test-performance.js:92:7)
00:52:03         at Object.<anonymous> (/home/iojs/build/workspace/node-test-commit-smartos/nodes/smartos17-64/test/sequential/test-performance.js:104:1)
00:52:03         at Module._compile (internal/modules/cjs/loader.js:722:30)
00:52:03         at Object.Module._extensions..js (internal/modules/cjs/loader.js:733:10)
00:52:03         at Module.load (internal/modules/cjs/loader.js:620:32)
00:52:03         at tryModuleLoad (internal/modules/cjs/loader.js:560:12)
00:52:03         at Function.Module._load (internal/modules/cjs/loader.js:552:3)
00:52:03         at Function.Module.runMain (internal/modules/cjs/loader.js:775:12)
00:52:03         at startup (internal/bootstrap/node.js:308:19)
00:52:03         at bootstrapNodeJSCore (internal/bootstrap/node.js:878:3)
00:52:03   ...

@Trott
Copy link
Member Author

Trott commented Nov 16, 2018

I want to run a stress test on smartos 17 but it doesn't look like it's in a choice in our stress test job on CI. Can someone add it? @nodejs/build

@Fishrock123
Copy link
Contributor

Happened again in #25656
https://ci.nodejs.org/job/node-test-binary-arm/5663/RUN_SUBSET=2,label=pi1-docker/testReport/(root)/test/sequential_test_performance/

Failed on a Raspberry PI 1 I think:
{ name: 'node',
  entryType: 'node',
  startTime: 0,
  duration: { around: 2265.783347129822 },
  nodeStart: { around: 0 },
  v8Start: { around: 0 },
  bootstrapComplete: { around: 2176.2094011306763 },
  environment: { around: 0 },
  loopStart: -1,
  loopExit: -1 }
{ name: 'node',
  entryType: 'node',
  startTime: 0,
  duration: { around: 3541.9893350601196 },
  nodeStart: { around: 0 },
  v8Start: { around: 0 },
  bootstrapComplete: { around: 2176.2094011306763 },
  environment: { around: 0 },
  loopStart: { around: 2176.2094011306763 },
  loopExit: -1 }
{ name: 'node',
  entryType: 'node',
  startTime: 0,
  duration: { around: 3574.7279500961304 },
  nodeStart: { around: 0 },
  v8Start: { around: 0 },
  bootstrapComplete: { around: 2176.2094011306763 },
  environment: { around: 0 },
  loopStart: { around: 2176.2094011306763 },
  loopExit: { around: 3575.0439462661743 } }
assert.js:351
    throw err;
    ^

AssertionError [ERR_ASSERTION]: bootstrapComplete: 1085.6422290802002 >= 1068
    at checkNodeTiming (/home/iojs/build/workspace/node-test-binary-arm/test/sequential/test-performance.js:92:7)
    at Timeout.setTimeout [as _onTimeout] (/home/iojs/build/workspace/node-test-binary-arm/test/sequential/test-performance.js:119:5)
    at listOnTimeout (timers.js:324:15)
    at processTimers (timers.js:268:5)

@richardlau
Copy link
Member

richardlau commented Jan 23, 2019

I want to run a stress test on smartos 17 but it doesn't look like it's in a choice in our stress test job on CI. Can someone add it? @nodejs/build

@Trott Not sure if still required, since this was from a few months back, but I've just added it (seemed like a reasonable thing to have there): https://ci.nodejs.org/view/Stress/job/node-stress-single-test/jobConfigHistory/showDiffFiles?timestamp1=2018-12-07_11-55-34&timestamp2=2019-01-23_14-21-08

@Fishrock123
Copy link
Contributor

For some reason this has affected all three runs of #25656 - does this just always happen on master right now?

@Fishrock123
Copy link
Contributor

Failing on my other rebased PR too.

@Trott
Copy link
Member Author

Trott commented Jan 24, 2019

Looks like it's barely-but-relatively-consistently failing due to arbitrary-ish timing constraints. The test lacks a comment explaining its purpose which is unfortunate, given it's bland name open to multiple interpretations. I guess it's testing perf_hooks generally?

What is this delta and delay stuff? Should this use common.platformTimeout() to adjust some values on slower machines? I mean, that's a band-aid and not a fix, but I'll certainly take it. Another workaround-but-not-a-solution-but-whatever-close-enough might be "change 1000 to 2000" or "change 1000 to 500". ¯\(ツ)

pinging folks who have authored lines in the test: @BridgeAR @jasnell @TimothyGu @apapirovski @addaleax

@Trott
Copy link
Member Author

Trott commented Jan 24, 2019

Happened for the first time on node-daily-master today so something seems to have happened in the last 24 hours to cause this to be perma-red (or perma-ish-red) on Pi 1, it would seem.

@nodejs/testing

https://ci.nodejs.org/job/node-test-binary-arm/5673/RUN_SUBSET=3,label=pi1-docker/console

test-requireio_continuationlabs-debian9-armv6l_pi1p-1

00:44:32 not ok 342 sequential/test-performance
00:44:32   ---
00:44:32   duration_ms: 3.785
00:44:32   severity: fail
00:44:32   exitcode: 1
00:44:32   stack: |-
00:44:32     { name: 'node',
00:44:32       entryType: 'node',
00:44:32       startTime: 0,
00:44:32       duration: { around: 2250.0830249786377 },
00:44:32       nodeStart: { around: 0 },
00:44:32       v8Start: { around: 0 },
00:44:32       bootstrapComplete: { around: 2159.4409494400024 },
00:44:32       environment: { around: 0 },
00:44:32       loopStart: -1,
00:44:32       loopExit: -1 }
00:44:32     { name: 'node',
00:44:32       entryType: 'node',
00:44:32       startTime: 0,
00:44:32       duration: { around: 3522.481032371521 },
00:44:32       nodeStart: { around: 0 },
00:44:32       v8Start: { around: 0 },
00:44:32       bootstrapComplete: { around: 2159.4409494400024 },
00:44:32       environment: { around: 0 },
00:44:32       loopStart: { around: 2159.4409494400024 },
00:44:32       loopExit: -1 }
00:44:32     { name: 'node',
00:44:32       entryType: 'node',
00:44:32       startTime: 0,
00:44:32       duration: { around: 3554.94070148468 },
00:44:32       nodeStart: { around: 0 },
00:44:32       v8Start: { around: 0 },
00:44:32       bootstrapComplete: { around: 2159.4409494400024 },
00:44:32       environment: { around: 0 },
00:44:32       loopStart: { around: 2159.4409494400024 },
00:44:32       loopExit: { around: 3555.3136978149414 } }
00:44:32     assert.js:351
00:44:32         throw err;
00:44:32         ^
00:44:32     
00:44:32     AssertionError [ERR_ASSERTION]: bootstrapComplete: 1075.962013244629 >= 1073
00:44:32         at checkNodeTiming (/home/iojs/build/workspace/node-test-binary-arm/test/sequential/test-performance.js:92:7)
00:44:32         at Timeout.setTimeout [as _onTimeout] (/home/iojs/build/workspace/node-test-binary-arm/test/sequential/test-performance.js:119:5)
00:44:32         at listOnTimeout (timers.js:324:15)
00:44:32         at processTimers (timers.js:268:5)
00:44:32   ...

Failed by just 2ms or so?

@Trott Trott pinned this issue Jan 24, 2019
@Trott
Copy link
Member Author

Trott commented Jan 24, 2019

This gets weird. https://ci.nodejs.org/job/node-test-binary-arm/5610/ was green. Hooray? But looking at the fanned jobs for Pi1, nowhere was sequential/test-performance run. And the PR (#25612) doesn't remove it or anything. The test is still in master and that PR is against master. ?????!!!!!! @nodejs/build

@joyeecheung
Copy link
Member

joyeecheung commented Jan 24, 2019

I am not entirely sure about what the test is actually doing but I guess it was failing because:

const inited = performance.now();

Is too far away from the bootstrap complete milestone? Then there could be two remedies:

  1. Get this value even earlier, do not get it only after common is required as common is not exactly small
  2. Make the milestone somewhat more accurate: process: split execution entry points into main scripts #25667 handles part of that, but it's hard to say at what point we should consider the bootstrap as complete, unless we take care of every branches involved during the whole bootstrap process and place the milestones at the precise points for every one of the possible conditions.

@Trott
Copy link
Member Author

Trott commented Jan 24, 2019

FWIW, the test seems to be passing again, at least for now, on Pi 1. See https://ci.nodejs.org/job/node-test-binary-arm/5677/RUN_SUBSET=3,label=pi1-docker/console for example. Not sure for how long, but I guess it's encouraging that it's not a big block fo red for the moment...

@gireeshpunathil
Copy link
Member

my take after reviewing the test case is:

  • though the value in this line
    let timeoutDelay = 111; // An extra of 111 ms for the first call.

    seems arbitrary to me, as this test has passed over time make it an empirically proven value that represents an acceptable limit to the bootstrap latency.
  • if that is agreed, then it is worthwhile to see recent commits that increased the boot time.
  • if that commit (or commits) holds merit on its own, then accept that, and move the baseline to accommodate that by computing a new value for timeoutDelay

@gireeshpunathil
Copy link
Member

gireeshpunathil commented Jan 25, 2019

this is what I did:

  • ran the test in a linux box (test does not fail here).
  • measured the actual time for it to reach the assertion point.
  • so for the bootstrapcomplete milestone I got 45 millis as the latency.
  • reduced the timeoutDelay until it failed.
  • deemed it as bad, and did biect.
  • and embarked on d3806f9, beyond which the latency is only 35 millis.

these are small numbers in my fast linux box, and able to accommodate in the very conservative number of 1111 millis in the test. However, in the failing arm I guess the baseline as well as the increase in bootup time due to the said commit could be significantly large I guess.

So:

  • does the bootup time increase due to d3806f9 justifiable? if not fix it,
  • if yes, how much it increases in arm, and is that acceptable? if not fix it,
  • if yes, accommodate this in the test

I guess the test is doing its intented work, and we should only be changing the heuristics, not the test logic itself.

/cc @joyeecheung

@joyeecheung
Copy link
Member

joyeecheung commented Jan 25, 2019

@gireeshpunathil In theory d3806f9 should’ve decreased the boot time because it delays the console extension until those are actually needed when the inspector console is accessed (and no longer done unconditionally during bootstrap), maybe a benchmark run. of misc/startup would help identifying that.

@gireeshpunathil
Copy link
Member

@joyeecheung - yes,reading the commit message I also got the same impression - this commit as a change to boost the boot time.

unfortunately I don't know how to run a benchmark test!

@Trott
Copy link
Member Author

Trott commented Jan 25, 2019

@gireeshpunathil In theory d3806f9 should’ve decreased the boot time because it delays the console extension until those are actually needed when the inspector console is accessed (and no longer done unconditionally during bootstrap), maybe a benchmark run. of misc/startup would help identifying that.

I suppose it's possible that the increase might be a peculiarity/limitation of perf_hooks rather than a real increase? @jasnell @TimothyGu

It's not clear to me that this test is actually trying to flag bootstrap performance issues as much as it is trying to confirm that perf_hooks module basically works. In that case, the fix might be #25695.

Speaking of perf_hooks, I guess the module should theoretically pass the tests at https://github.com/web-platform-tests/wpt/tree/master/performance-timeline. Those are the tests for the API spec that perf_hooks implements. It may be possible that another solution is to remove this test and replace with those. (I don't actually know.)

@Trott
Copy link
Member Author

Trott commented Jan 25, 2019

I guess I also don't feel like using Pi 1 as our performance canary is a great approach. I'm inclined to treat this as a "bare functionality of perf_hooks" test rather than a "watch performance in Node.js" test.

This is supported IMO by the fact that the test was added with the initial implementation of perf_hooks in 67269fd7f33 and not added after implementation as a performance test. We should probably rename it test-perf-hooks or something like that to avoid this confusion. @jasnell

@mhdawson
Copy link
Member

@Trott I don't know about it passing as its failed twice in attempts for me today.

@sam-github
Copy link
Contributor

Fwiw, I also got 2 failures in a row for #25688 as well, and going through the build history, @ofrobots and @danbev have also gotten failures on it today.

Trott pushed a commit to Trott/io.js that referenced this issue Jan 26, 2019
PR-URL: nodejs#25695
Fixes: nodejs#23291
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
@Trott
Copy link
Member Author

Trott commented Jan 26, 2019

Fixed by d1d357d

@Trott Trott closed this as completed Jan 26, 2019
@Trott Trott unpinned this issue Jan 26, 2019
@gireeshpunathil
Copy link
Member

ok, now that #25695 is landed and this one closed it makes little difference to raise my opinion; but raising it anyways.

these points:

  • failing Pi are slow machines with little ram and a slow chip
  • the test is not supposed to validate performance

would have been considerations at the time when the test was introduced IMO. Now that it showed us something which is indicative of a change in timing in terms of latency in the boot sequences. So the question is:

  • should we ignore what we saw and amend the test to do what it is designed to test,
    or
  • should we first make an effort to investigate what this failure means and implies?

we have many instances where unrelated tests threw light on serious issues in the code; issues associated in #25007 are latest examples - none of those were meant to catch race conditions; yet those tests were used to reproduce issues, troubleshoot, fix and finally validate the correctness of the fix.

my summary is that the test threw something to us - by design or by chance; amending the code block that suppresses the issue without looking at the root cause of the failure may be like silently ignoring an error.

@Trott
Copy link
Member Author

Trott commented Jan 26, 2019

should we ignore what we saw and amend the test to do what it is designed to test,
or
should we first make an effort to investigate what this failure means and implies?

I guess ideally we'd do both. I don't want to leave it broken, causing people to either not be able to land valid changes or else get into the habit of ignoring failing test suites.

But as you are suggesting, I also want to understand why the test started failing so a determination can be made if it is a sign of a larger problem.

I think the issue this brought up is probably more appropriate for a benchmark than a test. (And maybe there already is a benchmark in benchmark/misc/startup.js?) Perhaps we can, at some point, introduce nightly performance benchmarks that will fail if things fall below a certain threshold or if there is a drop in performance larger than a certain percentage. (I'd like to see something like that for coverage stats too.)

@Trott
Copy link
Member Author

Trott commented Jan 26, 2019

But as you are suggesting, I also want to understand why the test started failing so a determination can be made if it is a sign of a larger problem.

Perhaps the thing to do is to open a new issue about the slower startup. Because I guess we are talking about two different things anyway. One issue is that a test is failing when it should not be failing. That's this issue. It's about fixing the test. But the second issue, which this one uncovered, is why is startup slower than it used to be? These questions are related, but they can (and probably should?) be considered separately? Like, it's possible (as we've done here) to fix the first issue without successfully addressing the second.

addaleax pushed a commit that referenced this issue Jan 28, 2019
PR-URL: #25695
Fixes: #23291
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
@refack refack added the flaky-test Issues and PRs related to the tests with unstable failures on the CI. label Jan 30, 2019
BethGriggs pushed a commit that referenced this issue Apr 29, 2019
PR-URL: #25695
Fixes: #23291
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
BethGriggs pushed a commit that referenced this issue May 10, 2019
PR-URL: #25695
Fixes: #23291
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
MylesBorins pushed a commit that referenced this issue May 16, 2019
PR-URL: #25695
Fixes: #23291
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI. performance Issues and PRs related to the performance of Node.js. test Issues and PRs related to the tests.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants