Performance - Data batching #3336

juancampa · 2018-12-16T22:01:07Z

Problem

When the pty produces a lot of output in a very short period of time (e.g. cat very-long-file), many IPC session data messages are passed to the renderer. This causes a lot of thrashing (see profiler screenshot) due to ipcListener emitting a ton of chunks.

Each chunk is typically 4k but the vt can parse/render a lot more per frame (especially with the coming WebGL renderer). Each message is prefixed with the window's uid, which is then removed on the renderer process before parsing, adding GC pressure (probably) and CPU cost on both processes (we're CPU-bound on the renderer process due to VT parsing being very costly)

Solution

Batch data in a buffer before sending it to the renderer.

Benchmarks

(left: no batching, i.e. canary, right: batching, i.e. this branch)

find ~

	canary	canary + webgl	this branch	this branch + webgl ⭐
Throughput	~35s	~36s 1️⃣	~35s	~15s 2️⃣
FPS	3* then 17 3️⃣	40+	12-16 4️⃣	40+

This change makes a huge difference with the WebGL renderer enabled (compare 1️⃣ vs. 2️⃣) because the renderer is focused on processing one big chunk instead of a ton of tiny ones + canvas rendering. Without WebGL, it makes the framerate less stuttery (compare 3️⃣ vs 4️⃣)

[*]: while the benchmark process is still generating output

Process: Throughput is measured as the time between running the command until the next prompt is rendered. FPS are taken from DevTools renderer FPS widget.

Additionally I'm removing a call to disable pinch-zoom, it's disable by default since Electron 2 (https://electronjs.org/releases#2.0.0)

juancampa · 2018-12-16T22:05:07Z

app/session.js

+
+// Max size of a session data batch. Note that this value can be exceeded by ~4k
+// (chunk sizes seem to be 4k at the most)
+const BATCH_MAX_SIZE = 200 * 1024;


I tried different values but this one (200k) had the best throughput. Using, e.g. 100k, can lower the throughput by ~25% 🔥, but increasing it further (say 300k) lowers the framerate to <30

A future improvement could determine this value dynamically

@juancampa what strategies would you use for dynamic computation of this value? Maybe we can create an RFC issue?

Few notes from my side:
Keep in mind that the 16ms stack on top of the input roundtrip latency and ppl might start complaining about lousy terminal response while typing. In the xterm.js demo app I found 5 ms helping alot while not yet being noticeable, 15 ms already introduced a tiny lag when typing really fast.

4k is the default pty buffer size on many linux distros, but you cannot rely on that (can be changed for every pty and system wide). BSDs (and macOS) use an adaptive buffer size typically ranging from 16k to 64k depending on the current data pressure.

Btw 200k seems quite big to me, you might want to test this with yes, it triggers a very exp. 'lineFeed' event in the terminal for every second byte. Additionally the event can be hooked by external handler, that might introduce even more workload (not sure if you allow plugins to consume it).

Keep in mind that the 16ms stack on top of the input roundtrip latency and ppl might start complaining about lousy terminal response while typing. In the xterm.js demo app I found 5 ms helping alot while not yet being noticeable, 15 ms already introduced a tiny lag when typing really fast.

I had the same concern (main reason I use kitty) but I haven't felt any lag so far. We'll have to keep an eye on feedback.

4k is the default pty buffer size on many linux distros, but you cannot rely on that (can be changed for every pty and system wide). BSDs (and macOS) use an adaptive buffer size typically ranging from 16k to 64k depending on the current data pressure.

Good point, I actually have to remove the "Note..." part, It was happening in my origin implementation, but it didn't feel right to go over a "max", so I fixed it to never exceed BATCH_MAX_SIZE

what strategies would you use for dynamic computation of this value? Maybe we can create an RFC issue?

I was thinking the parser could generate "back pressure". Basically "Hey, the chunks you are sending are too big for me to parse/render in one frame, please send smaller ones". This could be less important if we run the parser on a WebWorker which I think should be the next step (limiting factor on the renderer process).

UPDATE:

Might be better to implement this in xterm.js since it overlaps a lot with xtermjs/xterm.js#1818 (comment)

Btw 200k seems quite big to me, you might want to test this with yes, it triggers a very exp. 'lineFeed' event in the terminal for every second byte

I just tested yes and the buffering definitely doesn't help (drops FPS from ~50 to ~30). Basically 200k is a good value for find ~ which is arguably a more common output than yes. This is all more evidence for implementing @jerch's proposal here: xtermjs/xterm.js#1818 (comment)

Yeah I think its better suited in Terminal.write. We already do xon/xoff flow control there, imho its better to keep it in one place. Otherwise every buffering step involved would have to implement flow control on its own to get a working back pressure chain, which gets quite cumbersome with node's async nature.

chabou · 2018-12-16T22:37:09Z

Be careful, this PR contains #3329 commits (that should be merged before)

juancampa · 2018-12-16T23:17:39Z

this PR contains #3329 commits (that should be merged before)

Yeah, I didn't want to test this on canary since it might change the benchmark results. I just made the warning more prominent :)

jerch · 2018-12-17T02:48:16Z

app/session.js

+  constructor(uid) {
+    super();
+    this.uid = uid;
+    this.decoder = new StringDecoder('utf8');


UTF8? hyper is already using binary transport for the pty? We might add this eventually to xterm.js in the future, had some good results with it lifting some burden from the input parsing (currently there are some nonsense forth and back conversions going on).

Not sure I follow this one. IIUC electron can't send Buffer objects to the renderer process so we turn the binary input into a String. If we move parsing to a worker, we might be able to use SharedArrayBuffer to minimize conversions

But then why are you dealing with UTF8 at all? node-pty is supposed to return JS strings already, if encoding is omitted (defaults to UTF8 being decoded by the underlying Socket).

So how do you send data over to the window process? This is not able to handle binary data? Not a websocket? (Im not used to electron internals.)

Bummer - electron's IPC mechanism supports ArrayBuffers since v3, but gets BASE64 encoded and a JSON envelope (electron/electron#13055).

From my own tests electron performs worse than our demo app based on websockets (about half as fast), maybe the ipc mechanism is the culprit? Even strings get the JSON envelope, thats not any good from the perf perspective. Maybe you want try out our demo app and compare yourself, not sure if directly invoking node-pty in the renderer or falling back to websockets would solve this.

Ok enough thread hijacking 😸

but gets BASE64 encoded and a JSON envelope

Exactly :(

falling back to websockets

IPC doesn't seem to cost that much though, V8 strings are pretty fast. In this profile of one frame, IPC only takes a tiny sliver (<0.5ms), not a lot of gains there. It might become more prominent when we do WebWorkers though, with an extra hop, but then again, we'll probably use SharedArrayBuffer for that

I just realized that real gains are hidden in the parsing of strings part (i.e. your fix xtermjs/xterm.js#1796), so ignore that last part 😅

Yeah thats the basic idea behind my optimization plan (xtermjs/xterm.js#1811) - by getting rid of any string handling in the input chain and working directly on arraybuffer we can lower the runtime needed for input handling by >50%.
This gets even more if we change to direct utf8 binary input as it would save the nonsense conversions. If you are interested I have a heavily optimized playground branch, where you can test it in the demo app (https://github.com/jerch/xterm.js/tree/tttt, also needs https://github.com/jerch/node-textdecode)

If you cannot get it run, here are my results: xtermjs/xterm.js#791 (comment) 😄

…Linux

…mp-v4

…atching

juancampa · 2018-12-27T17:38:25Z

@chabou @rauchg now that electron 4 is merged, this is ready to go, if you guys approve.

juancampa added 9 commits December 12, 2018 13:23

Bumping electron to 3.0.10

9e30b2a

Updating node version in travis and appveyor

5034606

Fixing incorrect require of electron-fetch

ac0735a

Fix zoom to match previous versions

cb95268

Additionally I'm removing a call to disable pinch-zoom, it's disable by default since Electron 2 (https://electronjs.org/releases#2.0.0)

Bumping electron to 4.0.0-beta.8

cbe3bd9

Bumping electron to 4.0.0-beta.9

c1d0c4d

Work around for Copy accelerator not firing on electron v4

436b71c

Batch session data before sending it to renderer

3ae50f6

Fix linting issues

a720fd7

juancampa commented Dec 16, 2018

View reviewed changes

juancampa changed the title ~~Perf data batching~~ Performance - Data batching Dec 16, 2018

juancampa requested review from chabou and rauchg December 16, 2018 22:10

chabou added the 📊 Type: Performance Issue contains information about a performance issue in Hyper label Dec 16, 2018

jerch reviewed Dec 17, 2018

View reviewed changes

juancampa added 10 commits December 21, 2018 19:29

Fixing header/titlebar in MacOS

7fce112

Upgrading to electron 4.0.0 and node-pty 0.8.0

3d8ce1e

Adding yarn.lock changes for electron 4.0.0

7f2960b

Adding comments for editor:copy workaround. Scaling issue is only on …

2a8eaf2

…Linux

Upgrading node-abi to support electron 4.0.0

79bd78e

Merge branch 'canary' of git://github.com/zeit/hyper into electron-bu…

bdfbfa4

…mp-v4

Merge remote-tracking branch 'zeit/canary' into perf-data-batching

968b71e

Merge branch 'electron-bump-v4' into perf-data-batching

3b383de

Merge branch 'canary' of git://github.com/zeit/hyper into perf-data-b…

2db1721

…atching

Adding isDestroyed check

3dd9054

rauchg merged commit c07700a into vercel:canary Dec 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance - Data batching #3336

Performance - Data batching #3336

juancampa commented Dec 16, 2018 •

edited

Loading

juancampa Dec 16, 2018 •

edited

Loading

juancampa Dec 16, 2018

rauchg Dec 16, 2018

jerch Dec 17, 2018

jerch Dec 17, 2018

juancampa Dec 17, 2018

juancampa Dec 17, 2018 •

edited

Loading

juancampa Dec 17, 2018

jerch Dec 17, 2018

chabou commented Dec 16, 2018

juancampa commented Dec 16, 2018 •

edited

Loading

jerch Dec 17, 2018 •

edited

Loading

juancampa Dec 17, 2018 •

edited

Loading

jerch Dec 17, 2018 •

edited

Loading

jerch Dec 17, 2018

juancampa Dec 17, 2018

juancampa Dec 17, 2018

jerch Dec 17, 2018 •

edited

Loading

jerch Dec 17, 2018

juancampa commented Dec 27, 2018

Performance - Data batching #3336

Performance - Data batching #3336

Conversation

juancampa commented Dec 16, 2018 • edited Loading

Problem

Solution

Benchmarks

juancampa Dec 16, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

juancampa Dec 17, 2018 • edited Loading

Choose a reason for hiding this comment

UPDATE:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chabou commented Dec 16, 2018

juancampa commented Dec 16, 2018 • edited Loading

jerch Dec 17, 2018 • edited Loading

Choose a reason for hiding this comment

juancampa Dec 17, 2018 • edited Loading

Choose a reason for hiding this comment

jerch Dec 17, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jerch Dec 17, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

juancampa commented Dec 27, 2018

juancampa commented Dec 16, 2018 •

edited

Loading

juancampa Dec 16, 2018 •

edited

Loading

juancampa Dec 17, 2018 •

edited

Loading

juancampa commented Dec 16, 2018 •

edited

Loading

jerch Dec 17, 2018 •

edited

Loading

juancampa Dec 17, 2018 •

edited

Loading

jerch Dec 17, 2018 •

edited

Loading

jerch Dec 17, 2018 •

edited

Loading