-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Chunk traverse #1952
Optimize Chunk traverse #1952
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of comments
- Interestingly, this code is not referentially transparent strictly speaking, but you cannot observe it. That's pretty clever.
- Any idea on what the perf impact of this change is?
Thanks for the comment @SystemFw . One difference here is that it won't blow the stack on very big chunks for non-trampolining For anyone using Chunk on its own, they might really care that they can traverse them for sizes over 100k (really unbounded with this, but before it would have blown up), and non-trampolined I didn't benchmark this, but my guess is that if you have an |
The old traverse will also be stack safe (just tried traversing with Option on a Chunk with 1 million elems), but I'm okay with merging if we're confident there is going to be a perf benefit :) |
I'll make a benchmark before we merge. Let's be rigorous. |
Option may short circuit on None so unless you have no Nones I'm not sure that is a real test. |
This is what I've used:
|
On main:
with this change:
Not sure the difference, but the |
I was wrong. This is about the same speed (maybe modestly faster), the win is the stack safety alone it looks like. Also, this design has N log N calls to F.map or F.map2 because it builds a tree up. The original implementation does use Eval, but it only calls F.map2 N times (in a linear chain). So, this PR is a bit behind the 8-ball due to that. I think it is worth merging for the added stack safety, but the performance motivation is not there. If we don't care about stack safety (which we don't seem to have in all cases currently), I bet you can go faster with a linear chain build without Eval and foldRight: just iterate through the list right to left building up an inner Here are benchmark results:
|
It occurred to me in a comment here: the magic here is the tree vs the linear structure, not the fact that it is a binary tree. It is a bit more complex to implement, but a fan-out of say 128 children per node or even 256 will preserve the log growth in stack size but will reduce the overhead of building the tree since we will build up linear chains of 128 or 256. This may allow us to actually see a performance boost in addition to the stack safety achieved. |
I noticed that
Stream.evalMapChunk
, which is a common operation for me, goes throughChunk.traverse
, however that is implemented in a pretty expensive way: via foldRight which wraps everything in another layer of Eval.I instead implemented a tree-based approach (to keep stack safety) but without wrapping in another Eval layer.