Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define "tee"ing a stream #271

Closed
domenic opened this issue Jan 27, 2015 · 13 comments
Closed

Define "tee"ing a stream #271

domenic opened this issue Jan 27, 2015 · 13 comments

Comments

@domenic
Copy link
Member

domenic commented Jan 27, 2015

See yutakahirano/fetch-with-streams#14. Fetch has req.clone() and res.clone() which are currently ill-defined. They should be defined here, in a generic fashion, and probably there should be a user-exposed API for it (one way or another).

Our current experiments with this are in TeeStream.md, but those are probably very outdated.

@yutakahirano
Copy link
Member

From fetch API point of view it might be desirable that req.clone() does not change req.body although I don't know if it is feasible from Streams point of view.

@domenic
Copy link
Member Author

domenic commented Jan 29, 2015

Yeah, this might be tricky to get right... I don't immediately see how to do it, but I want to stare at it for a while and see if maybe there's a way to make it work.

@annevk
Copy link
Member

annevk commented Feb 18, 2015

Note that tee is also used in step 1 of https://fetch.spec.whatwg.org/#concept-http-network-or-cache-fetch (same reason). I should probably abstract the "copy a request" operation.

@wanderview
Copy link
Member

How should clone() or tee() work when the resulting streams are read at wildly different rates?

Consider:

  1. UA has an underlying push data source that it can provide back pressure on. The full data stream is quite large. To prove the point, lets say something ridiculous like 100 GB.
  2. This data source is exposed as a ReadableStream or ReadableByteStream. We'll call this stream1.
  3. Content script calls stream1.clone() or stream1.tee() to create stream2.
  4. The script then reads stream1 rapidly, but does not read stream2 at all.

At this point, the UA must start buffering the underlying data source for all data in-between stream1's read position and stream2's read position.

My question is, can the UA provide back pressure on the underlying data source because stream2 is not being read?

This might be surprising since it will in effect stall stream1 until stream2 starts getting consumed. The alternative, though, is to buffer until OOM, but that does not seem desirable.

Can the spec clarify that the UA has the freedom to provide back pressure if one of the tee'd streams is read too slowly?

@annevk
Copy link
Member

annevk commented Feb 18, 2015

This might be surprising since it will in effect stall stream1 until stream2 starts getting consumed. The alternative, though, is to buffer until OOM, but that does not seem desirable.

That's our usual way of doing things... Alternatively, can we notify the streams in some way that this is happening?

@domenic
Copy link
Member Author

domenic commented Feb 18, 2015

Sorry, which is our usual way? OOM or stall?

I agree this is the hard problem with teeing. Maybe it even needs to be an option to pick between the two behaviors? That would still leave the question of what the default is.

@annevk
Copy link
Member

annevk commented Feb 18, 2015

OOM. Platform specifications hardly ever deal with limits.

@wanderview
Copy link
Member

Our underlying gecko primitives default to providing back pressure in this particular scenario. I'd like to be able to default to back pressure to make implementation easier and more efficient.

Of course, an unfortunate side effect of using back pressure here is that it makes GC observable:

  1. stream2 = stream1.tee();
  2. Read stream1 until it stalls.
  3. Throw away stream2.
  4. When GC collects stream2, then stream1 unblocks.

@wanderview
Copy link
Member

Or rather, I'd like the default to be weasel words to the effect that "the UA may provide back pressure to all clones if one of the peer streams is not being read".

@domenic
Copy link
Member Author

domenic commented Feb 18, 2015

Well, the plan was to define an actual TeeStream with a normative specification of how it operates on its single input and multiple output streams. (See old design here, but don't take it for anything serious.) Maybe the conceptual "tee a stream" could be more weasel-wordy.

@wanderview
Copy link
Member

Understood. I agree that pure js stream behavior need to be exactly defined.

In the fetch Request/Response clone() case, gecko uses "infinite" buffer size already to match XHR behavior of not doing any back pressure to the network. So this won't come into play for us there.

I understand blink does do back pressure for fetch(), but I don't know how blink's Request/Response clone() works. Back pressure on peer streams might be an issue there.

@yutakahirano
Copy link
Member

Currently Blink's fetch doesn't have the backpressure mechanism. When we implement it, I think the OOM behavior is the right way.

@tyoshino
Copy link
Member

Domenic created PR #302 to define "teeing" clearly. Discussion happening there now.

@domenic domenic modified the milestone: Week of 2015-03-30 Mar 30, 2015
domenic added a commit that referenced this issue Mar 31, 2015
Closes #271; supercedes #302.

Includes an abstract operation, TeeReadableStream(stream, shouldClone) which is meant for use by other specs, plus a method ReadableStream.prototype.tee() (which does no cloning).
@domenic domenic closed this as completed in 3ed32a5 Apr 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants