-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stream: added experimental support for for-await #17755
Conversation
Currently |
ESLint only supports ES features when they reach stage 4. The async iteration proposal is still at stage 3, so we would need to install babel-eslint: https://github.com/babel/babel-eslint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice start - I see a bigger issue though, this approach should work with for...await loops but async iterators in general have no guarantee.
We need to deal with backpressure here :)
doc/api/stream.md
Outdated
@@ -1159,6 +1159,31 @@ readable stream will release any internal resources. | |||
Implementors should not override this method, but instead implement | |||
[`readable._destroy`][readable-_destroy]. | |||
|
|||
##### readable[Symbol.asyncIterator] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Offtopic: I almost want to weep at how nice this is, I've seen it before (with the package and when we tested this a year ago) but finally having I/O that looks like Python's async/io and performs well is awesome - no more callbacks everywhere, clean asynchronous code :)
doc/api/stream.md
Outdated
print(fs.createReadStream('file')).catch(console.log); | ||
``` | ||
|
||
If you break or throw from within the for-await loop, the stream will be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to should elaborate exactly what methods break
ing and throw
ing call on the underlying ReadableStream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean:
for await (const k of readable) {
break;
}
this would call return()
on the AsyncIterator which will call .destroy()
.
This is debatable, as after .destroy()
the readable cannot be used anymore.
However:
for await (const k of readable) {
throw new Error('kaboom');
}
would call return()
as well on the AsyncIterator which will call .destroy()
.
This is the correct behavior, otherwise we would want to do all the time:
try {
for await (const k of readable) {
throw new Error('kaboom')
}
} finally {
readable.destroy()
}
Which will be prone to file descriptor leaking.
I'm not aware of a way to distinguish these two flows.
Should I document it in this way? Do you agree with the behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the behavior 100%, once you've started consuming an async iterator for a for await
loop it should probably close if I break.
Pinging @zenparsing - I remember a lot of discussion about this - what should be the behavior in your opinion?
As for documentation - that is exactly what I meant - we should document that destroy
will be called (rather than that the stream is destroyed) since it is a better guarantee for people subclassing ReadableStream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling destroy
is consistent with how async generator functions behave after calling return
on their iterators, so 👍 .
const { promisify } = require('util'); | ||
|
||
class Item { | ||
constructor(value, done) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pinging @caitp and @bmeurer - it would be interesting to know if we can avoid this being so expensive (allocating an object for the value of the iteration). In regular iterators V8 optimizes away for..of to regular iteration and other nice optimizations.
I think we should get a sense of how hard/easy this is to optimize, and if it's hard consider recycling objects for the iterator here which is dangerous but might be the only way we get reasonable performance outside of "scripting".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or at least, do away with the done
slot for the vast majority of objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just use plain object with 2 fields? Why everything has to be a class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@YurySolovyov not everything has to be a class, but in this particular case there are several benefits in naming the objects:
- When looking at heap allocation profiles, it makes them easier to recognize (rather than just Object).
- When debugging and looking at stack traces, you get more useful information since the objects are named.
As a platform, this makes naming objects appealing. I'm not sure it's worth it but it's definitely a reasonable call. I would name it AsyncIteratorRecord
to be ore similar to the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't have a big perf impact either way. I looked at perf of object / class instance creation in my work on nextTick
and it's negligible.
I don't like the idea of reusing it as that means the returned object can no longer be stored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also add that it make sure that objects created in multiple places maintain the same shape, this helps V8 in the optimization process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do away with the done slot for the vast majority of objects.
We can't do that if we want to conform to the spec for what next
should return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one suggestion I might have here is making done = false
so we don't have to repeat that everywhere. It's the default state after all...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apapirovski I meant - we can put done = false
on the prototype but I suspect it'll be slow (and might not conform).
I want this to eventually be fast to be useful :)
|
||
stream.on('end', () => { | ||
if (this.lastResolve !== null) { | ||
this.lastResolve(new Item(null, true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this design decision (emitting true
on the iterator as a separate value with null). Just making sure it's explicitly fine.
// destroy(err, cb) is a private API | ||
// we can guarantee we have that here, because we control the | ||
// Readable class this is attached to | ||
const destroy = promisify(this.stream.destroy.bind(this.stream)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should check if this is reasonably fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not called often, only with break
or throw
within the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't say that breaking inside a loop is an edge case, note this is also called when you return
inside a loop which is also pretty common.
// Readable class this is attached to | ||
const destroy = promisify(this.stream.destroy.bind(this.stream)); | ||
await destroy(null); | ||
return new Item(null, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm...
So this is a very nice start, but I'm not sure we work for a lot of cases where the async iterator isn't consumed in a Can you add tests for consuming streams without |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems great to me. Scattered some actionable and non-actionable comments.
I'm not really clear on the back-pressure concerns brought up, it seems to work similar to consuming a stream the normal way. Maybe I'm misunderstanding something...
process.nextTick(readAndResolve, this); | ||
}); | ||
|
||
stream.on('end', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, in general, bind
would be better here for performance.
That said, this makes me wish we already had WeakRefs
in JS because we could just make stream[kAsyncIterator]
and get rid of closures altogether.
const { promisify } = require('util'); | ||
|
||
class Item { | ||
constructor(value, done) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't have a big perf impact either way. I looked at perf of object / class instance creation in my work on nextTick
and it's negligible.
I don't like the idea of reusing it as that means the returned object can no longer be stored.
if (data) { | ||
resolve(new Item(data, false)); | ||
} else if (this.lastResolve !== null) { | ||
throw new Error('next can be called only once'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... given that readable
is handled on nextTick
, it seems like it could be possible to have this.lastResolve !== null
and at the same time have data
. Maybe that condition should go first, before even calling read()
? (And yes, I know, obscure edge case...)
this.lastResolve = null; | ||
this.lastReject = null; | ||
this.error = null; | ||
this.ended = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we store all of these in a way that's not publicly accessible? Whenever anything is made unintentionally public, we usually regret it later. 😞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually considering a ReadableAsyncInterator
a private object, meaning it's consumed by for await
, and it's not really user facing. Should we treat it as user-facing? I will replace those with symbols then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but it can be returned by readable[Symbol.asyncIterator]
so it's not truly private. I think the fact that it's really easy to get it and there might be legitimate use cases for it, makes me uneasy about exposing these props publicly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the "instance creation" comment above (can't comment on it for some reason) - the scary part isn't creating the objects it's the GC afterwards :)
lib/_stream_readable.js
Outdated
var warningEmitted = false; | ||
Readable.prototype[Symbol.asyncIterator] = function() { | ||
if (!warningEmitted) { | ||
process.emitWarning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an emitExperimentalWarning
somewhere in internal utils. We should start using it since it avoids needing to track warningEmitted
. If it needs more features, we should extend it as needed.
const { promisify } = require('util'); | ||
|
||
class Item { | ||
constructor(value, done) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do away with the done slot for the vast majority of objects.
We can't do that if we want to conform to the spec for what next
should return.
That would |
@benjamingr the current code works only with I'm not familiar with the tc39 proposal format, and I couldn't point where it is defined the behavior attached to the iterator object regarding things like backpressure and such. At this point, only the latest promise returned by I'm keen in not implementing a backpressure mechanism which will be needed to support multiple parallel Supporting only one Can you point me to cases where having parallel |
The spec calls for an internal queue that tracks outstanding That said, I don't really know a practical situation where this would be useful... |
da33625
to
37aaf36
Compare
I see, but it is not a valid implementation of Note that it's also fine to make the
With iterators unlike push-streams or observables backpressure is actually very easy since the consumer is the one asking for items. Instead of things being pushed to the consumer they are explicitly asked for with
Yes, although it's not supposed to be much more work:
This should remain fast while supporting the spec in its entirety. I'm excited about this feature and the PR and I feel very strongly about giving users an API that would behave like a normal AsyncIterator. |
Actually, come to think of it I think we might get away with rejecting the promise with an As long as it's not ignored I guess we can say this is expected behavior with readable streams and document it. It's not as good user experience but it's much better than what we do here and arguably simpler than supporting waiting for multiple values. It should also be tested. |
37aaf36
to
65eef32
Compare
@benjamingr @mcollina Here's a quick PoC of what I was talking about: c875223 It's not ready for usage or anything but it works as expected. Could likely be optimized quite a bit. Might have bugs. |
I will do the As I said, the next step is writing a benchmark, so we can make those tradeoffs with informed numbers. |
@apapirovski I'm not sure why we'd want a LinkedList implementation for this rather than just an array. It's not faster (we always push to the end), it's more allocations, less optimizable and less cache local. We actually want a deque most likely - but we can totally just use an array here - if we're concerned about |
@benjamingr |
@mcollina I just liked that name - sorry for being confusing. Such an error would have no meaning on a regular iterator - I was just suggesting an error name. |
The same reason we switched to using LinkedList in
Yeah, I'm aware. It's just a PoC, if we had benchmarks we could start optimizing that. |
@benjamingr Anyway, linked list and array are both overkill, I think we can just store latest promise since each new one just depends on the one before it. |
This is getting a little offtopic - so feel free to open an issue about it. A huge performance gain in bluebird is by using a double ended queue rather than an array or a linked list see this file. |
@mcollina This version is a lot simpler and no extra Promise required: 02b6336 We could likely simplify the conditionals even further. In the
Technically singly linked list (head-tail linked list) is the most common implementation of an unknown-length queue, and we don't need a double end queue (since we only remove from head and add to tail). Bluebird can get away with doing some things that we can't since they specify capacity — the trade-off is that it has to have a resize operation. I'm not 100% certain that what's implemented there is the fastest solution possible. Anyway, this is purely academic at this point since we're not using either. :) |
65eef32
to
9acefcd
Compare
@apapirovski I like your suggestion and I've included it. I will need to test and check what happens if the stream is destroyed in the meanwhile, but if it's not working is an easy fix. |
I'm +1 on the suggestion and its inclusion. @apapirovski I've moved our linked list discussion to mail to keep the thread clean. |
// destroy(err, cb) is a private API | ||
// we can guarantee we have that here, because we control the | ||
// Readable class this is attached to | ||
return new Promise((resolve, reject) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that's faster than what you had before this change (with promisify) :D
I'm sorry if I was confusing in the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think that this is faster because we would have to call promisify
on every destroy
method, as some instances override it. We will have to benchmark.
Rebased, PTAL. |
lib/_stream_readable.js
Outdated
@@ -922,6 +924,12 @@ Readable.prototype.wrap = function(stream) { | |||
return this; | |||
}; | |||
|
|||
Readable.prototype[Symbol.asyncIterator] = function() { | |||
emitExperimentalWarning('Readable[Symbol.AsyncIterator]'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asyncIterator
|
||
let err; | ||
try { | ||
/*eslint no-unused-vars: 0*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this disables the rule for the rest of the file. Instead you can use // eslint-disable-next-line no-unused-vars
Adds support for Symbol.asyncIterator into the Readable class. The stream is destroyed when the loop terminates with break or throw. Fixes: nodejs#15709
acdfb3a
to
de549cd
Compare
CI: https://ci.nodejs.org/job/node-test-pull-request/12500/ (last before landing) |
Landed as 61b4d60. |
Adds support for Symbol.asyncIterator into the Readable class. The stream is destroyed when the loop terminates with break or throw. Fixes: #15709 PR-URL: #17755 Fixes: #15709 Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com> Reviewed-By: Anatoli Papirovski <apapirovski@mac.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com> Reviewed-By: Michaël Zasso <targos@protonmail.com>
This is required because we need to add the babel-eslint dependency and it has to be able to resolve "eslint". babel-eslint is required to support future ES features such as async iterators and import.meta. Refs: #17755 PR-URL: #17820 Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
This is required because we need to add the babel-eslint dependency and it has to be able to resolve "eslint". babel-eslint is required to support future ES features such as async iterators and import.meta. Refs: #17755 PR-URL: #17820 Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Will it remain experimental in v11? |
@vsemozhetbyt I hope we could get it out of experimental before v10 goes to LTS. WHATWG streams is implementing the same thing atm, and I would like the semantics and APIs to match so that code that uses this would work identically. |
@mcollina I've also been talking to @jakearchibald about it |
In short: I'm +1 on unflagging and using asnyc iterators as an interop mechanism between whatwg and node streams. |
Hi all! I'm aware that this is an experimental feature but, Is there a way to enable it? I couldn't find how to do it neither in docs or Thanks! |
@betomoretti I’ve blogged about it – no flag needed: http://2ality.com/2018/04/async-iter-nodejs.html |
Adds support for Symbol.asyncIterator into the Readable class.
Fixes: #15709
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passesAffected core subsystem(s)
stream