Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stream.finished behaviour change #29699

Closed
mafintosh opened this issue Sep 25, 2019 · 26 comments
Closed

stream.finished behaviour change #29699

mafintosh opened this issue Sep 25, 2019 · 26 comments
Labels
stream Issues and PRs related to the stream subsystem.

Comments

@mafintosh
Copy link
Member

mafintosh commented Sep 25, 2019

Some recent changes to stream.finished landed in Node master changes the behaviour of the function.

Before (in Node 12) if a stream was about to emit end/finish but hadn't and emitted close it would not be treated as an error. In master this triggers a premature close.

It's due to the changes here.
https://github.com/nodejs/node/blob/master/lib/internal/streams/end-of-stream.js#L86

This breaks modules that emit close immediately after calling stream.end() / push(null), for example the popular multistream module, https://github.com/feross/multistream.

A simple test case looks like this:

const Multistream = require('multistream')
const fs = require('fs')

const m = new Multistream([
  fs.createReadStream(__filename)
])

m.resume()
stream.finished(m, function (err) {
  console.log('should not error:', err)
})

We might want to revert those changes or at least document them.

@mafintosh
Copy link
Member Author

/cc @mcollina

@addaleax addaleax added the stream Issues and PRs related to the stream subsystem. label Sep 25, 2019
@addaleax
Copy link
Member

@ronag

@ronag
Copy link
Member

ronag commented Sep 25, 2019

I'm unsure what the appropriate course of action here is. I'd consider emitting 'close' before 'end' as a bug. That's the whole thing with ERR_STREAM_PREMATURE_CLOSE? Though, we've made the consequences of the bug more severe.

This breaks modules that emit close immediately after calling stream.end() / push(null)

That sounds wrong though in general and is probably already broken in other more subtle ways.

If our strategy is to try to fix the ecosystem, I'm more than happy to try and help e.g. feross/multistream sort out these kind of bugs/issues (e.g. fix PRs).

@ronag
Copy link
Member

ronag commented Sep 25, 2019

The problem is that they call destroy() directly after push(null). This will break inside Readable/Duplex in strange ways as well (not just in finished).

https://github.com/feross/multistream/blob/master/index.js#L98

Also they are using a custom destroy implementation which further complicates things: https://github.com/feross/multistream/blob/master/index.js#L61. This implementation, unlike the native one, will emit 'close' in the same tick which is the root cause.

@ronag
Copy link
Member

ronag commented Sep 25, 2019

feross/multistream#46

@ronag
Copy link
Member

ronag commented Sep 25, 2019

@mcollina This goes back a little bit with going around the streams framework. In this particular case:

  • overriding Readable/Writable methods (e.g. destroy in this case).
  • manually emitting 'error' and 'close' events.
  • manually destroying on end/finish (i.e. not using autoDestroy).

I would suggest we update the documentation to more explicitly discourage this as it very easily breaks stream invariants and have consequences for the ecosystem.

@mafintosh
Copy link
Member Author

Custom destroy methods are everywhere in the ecosystem though, including using this pattern.

If we are going down the path of only supporting the latest stream ways we should remove all the legacy from the function, i.e. the request stream stuff https://github.com/nodejs/node/blob/master/lib/internal/streams/end-of-stream.js#L95

@ronag
Copy link
Member

ronag commented Sep 25, 2019

Custom destroy methods are everywhere in the ecosystem though, including using this pattern.

I do think we should support (but discourage) them within reason. We can't support every possible scenario. Especially not when it comes to what we otherwise consider as invalid.

Emitting 'close' before 'end' will in my opinion break the ecosystem in both existing and future code if people assume the "correct" behaviour based on native node streams.

If we are going down the path of only supporting the latest stream ways we should remove all the legacy from the function

I think we need to have a more balanced approach than that though. Those legacy stuff does not in any way hinder otherwise "correct" behaviour.

@sam-github
Copy link
Contributor

The problem is that they call destroy() directly after push(null).

My first thought is that breaking the stream ecosystem is not good, and we shouldn't do it.

But the specific thing above... I don't know, I might agree with @ronag on it. I ran into similar issues in the TLS testcases when updating to TLS1.3, because it changed the timing of the underlying stream activity.

As I understand what is happening in https://github.com/feross/multistream/blob/d63508c55e87d55a2ed6445ef79ba1c68881caad/index.js#L98-L99, push is async, its not at all guaranteed that the push will complete if delete() is called. The essential quality of delete() is that it does not wait, it doesn't participate in a deterministic event ordering, its just "as soon as possible". write(); write(); end() guarantees the writes will happen in order, then the end. write(); write(); end(); destroy() makes no guarantees at all of the relative order of destroy. If the underlying stream is capable, it might literally do all that, in order, or it could in fact buffer the write/write/end, then destroy the stream so that none of the writes occur.

Despite that its order isn't defined, with specific stream implementations, its behaviour will be predictable enough some code is relying on it, which is problematic.

@mcollina
Copy link
Member

I think we should revert that change.

@ronag
Copy link
Member

ronag commented Sep 25, 2019

Reverting the change will make other bugs possible such as:

const stream = require('stream')
const assert = require('assert')

const src = new stream.Readable()
src.push('a')
src.push(null)

let data = ''
const dst = new stream.Writable({
  write (chunk, encoding, cb) {
    setTimeout(() => {
      data += chunk
      cb()
    }, 1000)
  }
})

stream.pipeline(
  src,
  dst,
  err => {
    if (!err) {
      assert.strictEqual(data, 'a') // fails
    }
  }
)

process.nextTick(() => {
  dst.destroy()
})

We might already have such bugs in the ecosystem... but nobody notices it

@sam-github
Copy link
Contributor

The behaviour of example code above does not surprise me, I regard whether that data was written or not as undefined in the presence of a .destroy() (at least, a destroy() that is not synchronized with the write completed callbacks).

The behaviour can be made defined if .destroy() is passed an Error argument.

@ronag
Copy link
Member

ronag commented Sep 25, 2019

Another example:

http.createServer((req, res) => {
  stream.pipeline(
    fs.createReadStream(getPath(req)),
    res, // res does not emit 'error' if prematurely closed, instead it emits 'aborted'.
    err => {
      if (!err) {
        // Everything ok? Probably... but maybe not... depending on timing
      }
    }
  )
}).listen(0)

Or the corresponding example for client side (which is probably worse).

@jasnell
Copy link
Member

jasnell commented Sep 25, 2019

I can definitely see the argument @ronag is making here but I'm also super sensitive to breakage in the streams code. That is to say, I prefer the new behavior but don't want to break stuff. It's a bit wonky, but perhaps reverting the change, then adding it back behind a configuration option on the stream object would be a good approach. New implementations can make use of the new behavior by switching the option on, existing implementations can continue as they are. Later, we can make the choice of deprecating the existing behavior then eventually switching the default. Takes longer, but avoids breakage.

@sam-github
Copy link
Contributor

@jasnell are you suggesting per-stream-instance config, or global to node.js process config?

@jasnell
Copy link
Member

jasnell commented Sep 25, 2019

Per stream instance

@mafintosh
Copy link
Member Author

Alternatively just point people to end-of-stream / pump for the compatibility mode.

Ie. @ronag's example above works with pump but it's also doing some funky things to keep backwards compat.

@ronag
Copy link
Member

ronag commented Sep 25, 2019

behind a configuration option on the stream object would be a good approach

I think an option e.g. strict: true (for strict mode) can be useful for this as well as a few other similar things where we've had to compromise. Though it makes some things more complicated and difficult to maintain...

Alternatively just point people to end-of-stream / pump for the compatibility mode.

I prefer this option. Also, as I said I'm more than happy to help (as time allows) ecosystem modules resolve any related issues, e.g. multistream feross/multistream#47.

@sam-github
Copy link
Contributor

We've consistently not pushed people off the node.js API and onto ecosystem modules with versioned APIs. Arguably, we've done the exact opposite.

For example, https://www.npmjs.com/package/readable-stream is not mentioned anywhere in the Node.js docs, even though many heavy duty streams user suggests that everyone should be using it instead of node's streams API, and we froze https://www.npmjs.com/package/pump into node core as https://nodejs.org/api/all.html#stream_stream_pipeline_streams_callback, so we can't change it anymore without breaking our users.

So, at this late point to break the node core API and tell people they should have been using stable APIs from npm packages like pump probably won't go over so well.

The "strict" option just means that now we have to support and test yet another streams mode in addition to 2 and 3, with all the burden that implies. strict would be streams 3a, or 4? The examples of odd corner cases are pretty compelling examples of issues with current behaviour, but we've backed ourselves into a corner here by not developing a Node.js maintained/npm published set of APIs that can be versioned independently from core.

The streams WG members will weigh in with more experience than me, but I don't think breaking the ecosystem and helping them adjust is much of a plan. There is a long tail of ecosystem modules, many not so actively maintained. We haven't had much success even with Buffer.from/alloc, which is a trivial change to detect the need of and to fix, this change is more subtle. Also, its not clear how packages can adapt to working across multiple node.js versions if they want to support all the LTS release lines, as I hope they would.

@ronag
Copy link
Member

ronag commented Sep 26, 2019

Good points. I really don't know what we can do here then...

Leaving the ecosystem subtly but permanently broken doesn't sound like a good path either...

Fix it in readable-streams but leave it half broken in core?

@ronag
Copy link
Member

ronag commented Sep 26, 2019

I feel there is a fundamental problem here in terms of general strategy, not just this specific issue.

We have bugs, incorrect, unfortunate behaviour that we would like to address but we cannot due to either incorrect, unfortunate, outdated assumptions and even bugs in the ecosystem. Often these issues cause very subtle and intermittent bugs that are difficult to spot and often find the cause for. Just because these bugs don't throw directly in the users face doesn't mean they don't exist.

I would personally very much like to find some form of long term sustainable solution. Otherwise were stuck in a situation where it will never be possible for users to write "correct" code, only code that in different ways tries to compromise between different broken behaviours.

If the only way forward is e.g. streams4 with proper semver that lives on npm. I would personally much prefer that over the status quo.

@ronag
Copy link
Member

ronag commented Sep 26, 2019

We've consistently not pushed people off the node.js API and onto ecosystem modules with versioned APIs. Arguably, we've done the exact opposite.

I'm not familiar with the discussion around this. Maybe something to reconsider?

@ronag
Copy link
Member

ronag commented Sep 26, 2019

Solution proposal to original issue #29724

@Fishrock123
Copy link
Contributor

The suggested fix is semver-minor, so reverting beforehand may still be desirable.

@ronag
Copy link
Member

ronag commented Oct 11, 2019

@Trott: I believe this can be closed for now? Since the referred commits have been reverted.

ronag added a commit to nxtedition/node that referenced this issue Jan 18, 2020
Some broken streams might emit 'close' before 'end' or 'finish'.
However, if they have actually been ended or finished, this
should not result in a premature close error.

Fixes: nodejs#29699
@ronag
Copy link
Member

ronag commented May 1, 2020

I believe the topics of this issue has been resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stream Issues and PRs related to the stream subsystem.
Projects
None yet
7 participants