Improve performance of stderr and stdout stream buffer #888

MrBago · 2022-03-24T21:06:30Z

During testing I discovered that writing to stdout and stderr in ipykernel is bottlenecked on scheduling write events on IOPubThread. Specifically on this line which sends a message to notify the thread. Using a lock so we can avoid scheduling every write on the IOPubThread improves the performance more than 20x.

Here is the notebook I used for testing the performance, I start the notebook server with these arguments to prevent it from dropping output --NotebookApp.iopub_data_rate_limit=1000000000.0 --NotebookApp.iopub_msg_rate_limit=1000000:

Also in ipynb format: https://gist.github.com/MrBago/06a8e79be3dd2ccbdd1e954a27dd5f86

blink1073 · 2022-03-24T23:58:07Z

Thanks! It looks like the test_subprocess_print are relevant, but since it happens on only some of the builds it looks like a race condition.

MrBago · 2022-03-25T00:11:21Z

Thanks @blink1073. I looked into it, and on some platforms the subprocesses send the output to the parent processes over a zmq pipe that my PR removes :p. I was developing and testing this on osx, but i've verified that I broke it on linux.

I think another approach to get a similar speedup is to keep the zmp IOLoop and pipes, but protect the buffer with a simple lock so that we only need to call schedule for flush and not on every write.

MrBago · 2022-03-25T00:15:30Z

Now that I think about it, I like the lock approach much better. The other approach has the problem that the main thread can schedule tasks much faster than the pub thread can handle them. Without any back pressure, the flushes can become very infrequent and large. Doing the writes in the main thread means there will be a natural back pressure.

blink1073

Very nice! I'll leave this open for a bit in case anyone else has comments.

jasongrout

This looks good to me, and I get similar timings for the test @MrBago posted:

Before:

CPU times: user 4.85 s, sys: 2.1 s, total: 6.95 s
Wall time: 14.6 s

After

CPU times: user 390 ms, sys: 68.5 ms, total: 458 ms
Wall time: 500 ms

Seems like magic, befitting a PR with the number 888, the number of magic thrice!

@blink1073 - if/when this is merged, any chance we could release this soon?

Disclosure: @MrBago and I work together at Databricks

blink1073 · 2022-03-27T09:52:00Z

@blink1073 - if/when this is merged, any chance we could release this soon?

Yep, I'll release tomorrow.

davidbrochart · 2022-03-27T12:16:46Z

That's a great performance improvement, to say the least!
I'm wondering if you used the new kernel protocol over websocket to get these results, and how the performances compare between the old protocol and the new protocol.
Note that there is a new --NotebookApp.limit_rate that you can set to False to get the equivalent of --NotebookApp.iopub_data_rate_limit=1000000000.0 --NotebookApp.iopub_msg_rate_limit=1000000 (you could have used a value of -1 instead of an arbitrary high value in those, as well).

jasongrout · 2022-03-28T13:38:22Z

I also ran the test above without changing the iopub throttle limits, and there is no throttling (I was afraid we might be too fast now, and trigger the rate throttling). In fact, if I don't clear the output periodically, I get timings that are about twice as good (250ms instead of 500ms), still without triggering the default rate throttling.

@davidbrochart - the timings for the test above with this PR applied do not seem to change much for me with:

jlab 2
latest jlab 3 and jupyter-server 1.13.0 (prior to your PR being merged)
latest jlab 3 and jupyter-server 1.15 (after your PR was merged)
latest jlab 4 prerelease and jupyter-server 1.15 (after your pr was merged)

Make OutStream._buffer thread safe

2de496a

MrBago force-pushed the outstream-performance branch from 1402d86 to 2de496a Compare March 25, 2022 17:24

blink1073 added the enhancement label Mar 25, 2022

blink1073 approved these changes Mar 25, 2022

View reviewed changes

jasongrout approved these changes Mar 26, 2022

View reviewed changes

blink1073 merged commit 3059fd9 into ipython:main Mar 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of stderr and stdout stream buffer #888

Improve performance of stderr and stdout stream buffer #888

MrBago commented Mar 24, 2022 •

edited

Loading

blink1073 commented Mar 24, 2022

MrBago commented Mar 25, 2022

MrBago commented Mar 25, 2022

blink1073 left a comment

jasongrout left a comment

blink1073 commented Mar 27, 2022

davidbrochart commented Mar 27, 2022

jasongrout commented Mar 28, 2022

Improve performance of stderr and stdout stream buffer #888

Improve performance of stderr and stdout stream buffer #888

Conversation

MrBago commented Mar 24, 2022 • edited Loading

blink1073 commented Mar 24, 2022

MrBago commented Mar 25, 2022

MrBago commented Mar 25, 2022

blink1073 left a comment

Choose a reason for hiding this comment

jasongrout left a comment

Choose a reason for hiding this comment

blink1073 commented Mar 27, 2022

davidbrochart commented Mar 27, 2022

jasongrout commented Mar 28, 2022

MrBago commented Mar 24, 2022 •

edited

Loading