-
-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of stderr and stdout stream buffer #888
Conversation
Thanks! It looks like the |
Thanks @blink1073. I looked into it, and on some platforms the subprocesses send the output to the parent processes over a zmq pipe that my PR removes :p. I was developing and testing this on osx, but i've verified that I broke it on linux. I think another approach to get a similar speedup is to keep the zmp IOLoop and pipes, but protect the buffer with a simple lock so that we only need to call |
Now that I think about it, I like the lock approach much better. The other approach has the problem that the main thread can schedule tasks much faster than the pub thread can handle them. Without any back pressure, the flushes can become very infrequent and large. Doing the writes in the main thread means there will be a natural back pressure. |
1402d86
to
2de496a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! I'll leave this open for a bit in case anyone else has comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, and I get similar timings for the test @MrBago posted:
Before:
CPU times: user 4.85 s, sys: 2.1 s, total: 6.95 s
Wall time: 14.6 s
After
CPU times: user 390 ms, sys: 68.5 ms, total: 458 ms
Wall time: 500 ms
Seems like magic, befitting a PR with the number 888, the number of magic thrice!
@blink1073 - if/when this is merged, any chance we could release this soon?
Disclosure: @MrBago and I work together at Databricks
Yep, I'll release tomorrow. |
That's a great performance improvement, to say the least! |
I also ran the test above without changing the iopub throttle limits, and there is no throttling (I was afraid we might be too fast now, and trigger the rate throttling). In fact, if I don't clear the output periodically, I get timings that are about twice as good (250ms instead of 500ms), still without triggering the default rate throttling. @davidbrochart - the timings for the test above with this PR applied do not seem to change much for me with:
|
During testing I discovered that writing to stdout and stderr in ipykernel is bottlenecked on scheduling write events on
IOPubThread
. Specifically on this line which sends a message to notify the thread. Using a lock so we can avoid scheduling every write on the IOPubThread improves the performance more than 20x.Here is the notebook I used for testing the performance, I start the notebook server with these arguments to prevent it from dropping output
![Screen Shot 2022-03-25 at 10 26 40 AM](https://user-images.githubusercontent.com/223219/160171275-70c8bd8e-fb79-43e8-9b42-a6529f65490b.png)
--NotebookApp.iopub_data_rate_limit=1000000000.0 --NotebookApp.iopub_msg_rate_limit=1000000
:Also in ipynb format: https://gist.github.com/MrBago/06a8e79be3dd2ccbdd1e954a27dd5f86