Skip to content
This repository has been archived by the owner on Jul 18, 2024. It is now read-only.

Iterator Pipelining #1

Open
rnarubin opened this issue May 2, 2018 · 0 comments
Open

Iterator Pipelining #1

rnarubin opened this issue May 2, 2018 · 0 comments

Comments

@rnarubin
Copy link

rnarubin commented May 2, 2018

Porting Issue 20 from the enterprise repo

rnarubin commented on Aug 17, 2017

This proposal is more of an aspiration or a roadmap goal than it is a current issue, and would be fitting for a 2.0 release rather than the initial drop, but i want to at least bring it some attention.

The current async iterator design has completely independent intermediate methods, which create new and opaque iterators for downstream consumption. This has its benefits (relatively simple implementation, well defined isolation and separation of concerns) but some notable drawbacks, largely in performance. I call it the "futures everywhere" problem, where every layer of transformation adds possibly several future operations to every element in the iterator, even for plain synchronous operations. It is difficult for the JVM to optimize async code in the same way that it can synchronous code, because of inlining challenges with the way that closures are applied to values with many steps in between, and reordering restrictions around atomic fields.

I propose changing the library iterators to use an integrated pipeline design, similar to that of j.u.Stream. Every intermediate method will be stored as an operation in the pipeline, then composed and executed at certain uncollapsible points (terminal operations, "real"/unavoidable async boundaries, manual user iteration). Efficient terminal operations will then depend on an underlying implementation of forEach which can apply the composed operations and possibly terminate early.
For example, some iterators traverse elements in batches where every Nth operation is actually async, and the rest are immediate stages over elements of some buffered collection in memory. A terminal method over this iterator (with an appropriate forEach implementation) could then apply most of its pipeline transformations in a plain loop over these collections -- where HotSpot is great at optimizing loops over virtual calls -- with only occasional async breaks and substantially less overhead.

This solution isn't a substantial improvement in all cases. Notably iterators where every element is accessed asynchronously, or where many transformations are async, could not leverage a collapsed terminal method; though they would see minor benefits from composed intermediate sync methods. It would be no worse in these cases however, and would greatly improve cases where synchronous iteration is possible.

Importantly, these changes can all be done within the library and don't require changes in user code, so there is no compatibility issue. Although a user's iterator would benefit from implementing such a forEach, everything would still work under the hood by falling back to the current implementation when necessary. User controlled iteration would also require a fall-back, where the pipeline must be composed for every poll.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant