-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long-running application memory usage testing #2423
Comments
Tracking process |
👍 much needed validation, thanks for tackling this! |
I did not take a look on the details, but 2.5.0 seems more stable than 3.1/3.1.1, especially in the |
@ChALkeR yeah, heavy Buffer use puts us at the mercy of how V8 treats |
From the first look it could be related to #1671, but I'm not sure yet. |
Here's the flame graph for creating Buffers in v2.5.0: https://cldup.com/VDEJ25y_oL.svg Here it is for v3.1.0: https://cldup.com/90ZGBV_cfP.svg The two things that stand out to me are |
@trevnorris Memory and CPU are like apples and oranges. You can often get lower memory use in these synthetic IO benchmarks by using a smaller default Buffer size, but usually at the expense of CPU in increased garbage collection bookkeeping. Which variable are you trying to minimize? |
@kzc Choosing 64KB buffers was not random. The probable most common case of I/O in node is done over TCP. Which lends itself to the 64KB packet limit test. Which is why I chose to create the flamegraph with that value. Though I have much less concern at this point over memory usage, than the fact that Buffer instantiation time is over 2x slower then it was in v0.12. And not only in a small range of sizes, but at all tested sizes (which stopped at 256KB). |
A tight for loop around 1e7 unused new SlowBuffer()s doesn't measure Buffer instantiation time as much as it measures GC time as seen in the stats from flame graphs above:
You have to take the results of synthetic benchmarks with a grain of salt as it doesn't approximate a real life scenario which allocates buffer memory on an as-needed basis putting much less pressure on GC. One would also have to measure IO throughput in addition to measuring CPU to get a fair comparison of relative node version performance. Ultimately IO throughput is the most important metric, or perhaps throughput per unit of energy depending on the situation. |
I'm not quite sure what you mean here.
Do you have a benchmark to share? |
Doing testing between v2.5.0 and v3.2.0 I saw a jump in minor page faults from 27,300 to 5,955,571. So at a quick glance it would seem that something bad for performance is occurring. |
From the
These two show the issue at its worst because throughput is most dependent on being able to allocate new buffers, write them and clean them up once the write is complete. We have companies like F5 attempting to do millions of transactions, and when a single buffer allocation takes 0.5us more (whether from instantiation or necessary GC) it gravely affects their application. |
v8::internal::Heap::CollectAllGarbage is indirectly called by node::Buffer::Create. The act of creating a SlowBuffer invokes GC.
The ones provided in this ticket could be used. They just have to close the server and exit after a fixed number of transactions. |
@kzc, I still fail to see why does it need a |
Because 1e7 new SlowBuffer()s are created in a tight loop in JS, it has no choice but to reclaim memory from a GC within the v8 allocator. It's not a matter of good nor bad - just necessary. |
@kzc Scavenges are enough in all the supplied testcases, I believe. My question is why is it running full mark-sweeps. |
Wait, what? I've got to be misunderstanding what you're saying. node servers are doing high throughput for days/weeks/months at a time. So, what did you mean? Also, to make sure it's clear, Buffers are created w/ the |
Why? |
We're not using |
@trevnorris - I was referring to constructing a benchmark that exits after a certain number of HTTP requests. If you time the thing you get an idea of how long node takes to process N http requests/responses as fast as a client can handle. Nothing fancy. |
Ah okay. Thanks for clarifying. :) |
Why does creating a SlowBuffer invoke GC? It's clearly visible in both your flame graphs. It has to free up memory somehow in the tight loop. There's no pauses in the event loop to collect garbage otherwise. |
@kzc Why does it invoke a mark-sweep? |
malloc, free, v8 memory, GCing - it's all quite recursively intertwined. It's hard measure one in isolation because each affect all of them. Might it be possible to reduce GC by recycling the IO SlowBuffers with their already malloc()ed memory? |
Not exactly sure why, but would v8 have any other choice when allocating 64K * 1e7 in a tight JS loop? |
Here's the revised test script: var SlowBuffer = require('buffer').SlowBuffer;
var ITER = 1e5, SIZE = 65536, cntr = 0;
var t = process.hrtime();
(function runner() {
if (++cntr > ITER) return print();
new SlowBuffer(SIZE);
setImmediate(runner);
}());
function print() {
t = process.hrtime(t);
console.log(((t[0] * 1e9 + t[1]) / ITER).toFixed(1) + ' ns/op');
} This allows a full turn of the event loop between each Buffer creation. Even so, allocating 64KB buffers runs in 3060 ns/op and 5300 ns/op respectively. |
@trevnorris What I'm talking about when I mention scavenges vs mark-sweeps is that if you insert an If it doesn't, you might have to decrease 177 to something lower, the point is to run a scavenge before v8 decides to run a mark-sweep. |
@kzc Try inserting |
20x «concurrency» and a warmup: 'use strict';
var limit = 1000000;
var concurrent = 20;
var scavenges = true;
var warmup = limit / 10;
var start, count = 0;
limit += warmup;
function call() {
var buffer = new Buffer(64 * 1024);
var bufferx;
if (scavenges && (count % 170 === 0)) {
// gc(true) is a scavenge. Not sure if this is documented.
gc(true);
}
count++;
if (count > limit) {
console.log(process.hrtime(start));
process.exit(0);
} else if (count === warmup) {
start = process.hrtime();
}
setImmediate(call);
}
for (var i = 0; i < concurrent; i++) {
call();
} This has up to 2x speedup with @trevnorris Yes, with |
@ChALkeR The reason |
@trevnorris - Even so for allocating 64KB buffers run in 3060 ns/op and 5300 ns/op respectively. I see. Fair enough. Might it be possible to reduce GC by recycling the old IO SlowBuffers in their entirety with their already malloc()ed memory? @ChALkeR - 2x speedup with gc(true) - very cool. |
Btw, the memory usage also drops about 2.5x times with
Measured at the end, but it should be fine, as RSS doesn't generally go down. |
No. The |
My point is that the default GC strategy is highly inefficient at least in these testcases, and it is taking at least half of the total testcase time to do things that aren't really needed here. And it can't even keep the memory down as low as it could if it was using scavenges. I am not yet sure how that affects real-world apps, it needs more testing. The main issue for that is #1671. |
@ChALkeR - your modification of @trevnorris' benchmark is really fascinating. You're really on to something here. 1,100,001
The gc(true) effect on lowering elapsed time, memory and CPU I can understand, but the |
Let me take that statement back - new Buffer(65336) is not used, after all. An HTTP benchmark with 20 concurrent connections would better resemble a real world scenario. |
@rvagg what is the status here? Do you still actively use this? Still potentially interested in bringing it into the org? |
@cjihrig probably not, I don't want to lump the org with activity that there is little interest in maintaining. I'll keep this as a private project for now unless others want to join the fun. |
OK, sounds good. For the record, if you change your mind, I'd be +1 to having it in the org. Seems like useful info to have. |
I don't want to be presumptuous and drop a new repo into this org so I'm starting it here: https://github.com/rvagg/node-memtest but will move it if it's considered helpful by others (perhaps there's a good critique that will demolish my approach).
I've been using it to test long-running memory usage across versions to generate graphs like this:
It can either put data into InfluxDB for plotting by Grafana or it can spit out memory usage to stdout so you can watch it stream by.
There are only two "tests" in there so far, just starters, I imagine the list of tests expanding over time as we come up with use-cases from the wild that demonstrate interesting behaviour we should monitor.
The list of Node & io.js versions is also dynamic, I've been including all of the "stable" ones and in the above screenshot I also have a nightly of
master
in the mix to test of the 3.0.0 leak is still there. It could be evolved to automatically pick up new nightlies and continually be testing and producing data for us to give us confidence in our releases.I've set up anonymous access to an instance I have running here if you want to have a poke at it: http://128.199.216.28:3000/dashboard/db/memory-tests
Is anyone interested in helping out with this?
The text was updated successfully, but these errors were encountered: