Backpressure on fetch integrated with Streams #452

yutakahirano · 2014-09-10T01:54:02Z

Imagine ReadaleStream is integrated with fetch. The most naive way is to add stream() method to Body interface.

interface Body {
    ...
  Promise<ReadableStream> stream();
};

Streams is an API that enables the data producer to know that the consumer doesn't want the data and stop producing data. On the other hand, other data types (text, blob, ...) don't have such feature.
My question is, should the loader start loading body while the Body representation is undetermined?

fetch(request).then(function(res) {
  return wait(longtime).then(function() {
    return res.stream();
  });
}).then(function(stream) {
  // Here we have a readable stream.
});

The above code waits longtime and then demonstrates that it wants to get the body data via ReadableStream interface. Should the loader load the body data while waiting and use the internal buffer unlimitedly, or stop loading while the representation is undetermined?

We can provide that information explicitly when calling fetch, but I don't know if it's a good API.

The text was updated successfully, but these errors were encountered:

yutakahirano · 2014-09-10T01:59:53Z

I'm not sure this is the right place to discuss this issue, but I chose here because it could affect the fetch interface.

yutakahirano · 2014-09-10T02:01:29Z

@annevk @tyoshino

annevk · 2014-09-10T07:50:49Z

It would be good to get input from @domenic.

Note that I expect we would not expose a Stream through a promise. I would expect it to be returned from response.body synchronously. And then if you use one of the methods such as json() it would no longer work or not work predictably.

domenic · 2014-09-14T13:42:01Z

To explain @annevk's comment a bit more, I believe our thinking was that body would be an instance of RequestBodyStream, which is a ReadableStream subclass that also has methods like json() etc. The .stream() idea doesn't seem bad on first glance, but I haven't thought about it too hard; in what follows I'll assume we're going with RequestBodyStream design.

(Conceptually, .json()s implementation would look roughly like the readableStreamToArray example, except it would decode and concat the chunks into a string instead of putting them in an array, and it would do JSON.parse before returning the result. In reality, it would probably be done much more efficiently, e.g. pre-allocate an ArrayBuffer, read all the data into it, and then decode all at once.)

My expectation would then be that upon fetch() being called, the body created and data is buffered internally (or left in the kernel buffer, perhaps?) up to the stream's high water mark (HWM). Thus, if the user never calls .json() or .read() or .pipeTo(...) or anything else that would read the stream, the maximum memory consumed is equal to max(HWM, Content-Length). (Or perhaps slightly above the HWM if backpressure cannot be exerted fast enough.) If the user later calls .json(), it un-pauses the stream until all the data is read and converted into JSON. Similarly if they call .read(), they consume buffered chunks, until they consume all the chunks that were buffered at which point the stream becomes "waiting" and they need to call .wait().

The choice of HWM could in theory be given to users (e.g. as an option to fetch()), but in practice in Node.js we see that nobody uses such a capability and they just accept the default. Node.js chooses a default of 16 MiB, but implementations could use heuristics (e.g. use a lower number on mobile phones). A HWM of 0 would be fine too.

/cc @tyoshino in case there's something I am forgetting

annevk · 2014-09-15T15:01:41Z

@domenic json() and friends moved directly to Response/Request. So body would be a "pure" stream object I think.

yutakahirano · 2014-09-18T05:33:10Z

Thank you. My understanding is the following - Are those right?

Request / Response have 'body' property of type ReadableStream.
When json(), text(), etc are called the stream gets closed.
When read() is called on the stream bodyUsed becomes true.

annevk · 2014-09-18T06:49:49Z

Yeah, although I'm not sure if we can make the third bullet point work if we keep the stream object pure. Perhaps through some temporary observer? But even that would have to be synchronous for bodyUsed to return the correct value...

domenic · 2014-09-18T14:50:34Z

The third bullet point is definitely achievable since we'd control the creation of the stream. E.g.

var that = this;
this.body = new ReadableByteStream({
  readInto(/* ... */) {
    // actual work
    that.bodyUsed = true;
  }
});

tyoshino · 2014-09-19T07:32:35Z

@domenic #452 (comment)

max(HWM, Content-Length)

min?

The proposed back pressure mechanism looks correct. Regardless of where json() etc. are placed, we want to just start receiving and buffer some amount of data to reduce latency. The amount would be limited by some "high water mark". The buffering could be implemented not by ReadableByteStream but some platform (e.g. Blink) specific buffering library.

third bullet point

I agree that it works.

To simplify implementation, we would want to use ReadableByteStream not only for the body property but also as a backend for json(), etc. as @domenic proposed.

But even if we adopt this approach, it seems we can still choose to determine the body reading method to provide to the user at access on body property. Not on body.read() call.

The body attribute's getter must run the steps below:
1. If the used flag is set, return null.
2. Set used flag.
3. Return a ReadableByteStream representing the associated body.

Unless we want to allow the user to investigate body before deciding what body reading method to use. I guess there's no such need.

tyoshino · 2014-09-19T07:36:47Z

Fixed to return the ReadableByteStream for the second and later evaluation on the body property once we choose to use the stream.

Objects implementing the Body ... an associated byteStream (initially set to null), ...

If byteStream is not null, return byteStream.
If the used flag is set, return null.
Set used flag.
Set byteStream to a ReadableByteStream representing the associated body.
Return byteStream.

annevk · 2014-09-19T07:57:07Z

@tyoshino I guess that could work, but perhaps we should make it a method then given all the side effects. But instead of returning a promise it would be synchronous. Not sure though.

tyoshino · 2014-09-19T09:05:16Z

make it a method

@annevk Sounds good

domenic · 2014-09-19T22:47:43Z

I don't really like the side-effecting method. I'd rather just go with the approach from #452 (comment). It also makes the name bodyUsed a lie as it's more someoneOnceAccessedTheBodyPropertyForUnknownPurposes ;)

yutakahirano · 2014-10-21T08:21:27Z

Thanks, I will write a draft for the integration. @tyoshino's proposal looks fine. @domenic, can you tell me why you don't like it?

domenic · 2014-10-21T16:36:03Z

@yutakahirano because it's more complex (both in internal code and user-facing code/concepts) than #452 (comment) and doesn't gain anything.

tyoshino · 2014-10-27T19:16:56Z

@domenic

I see. json() and text() are only-once methods. So, we don't have to introduce a little complicated algorithm like #452 (comment).

In your idea, bodyUsed is set when json(), text() or any ReadableStream method is called, and are used to prohibit json() and text() but it must not disable the ReadableStream methods (we'll read() many times). Right? Do we want to prohibit touching ReadableStream methods as well when json() or text() takes place first? Then, we need something else than bodyUsed. Otherwise, we don't (bodyUsed suffices).

domenic · 2014-10-27T23:10:52Z

Do we want to prohibit touching ReadableStream methods as well when json() or text() takes place first?

Nah, if you do that you'll get what you asked for. (A mess.)

tyoshino · 2014-10-28T05:32:10Z

Right. To choose not to do that, I want to give a justification supporting that we're putting a guard against json()-after-read() but not against read()-after-json(). Both of them look unexpected usage. json() would be pumping on the stream. It sounds good to give json() exclusive access to the stream.

@yutakahirano said that it might be useful to keep cancel() available even after json() call as it allows us to abort json(). Is this included in your motivation to keep ReadableStream methods available after json() call?

Regarding guard against read()-after-json(), alternative proposed by @yutakahirano is making stream() only-once method. It's cumbersome that you need to save the returned stream to some variable. But we can reduce the complexity of algorithm inside.

domenic · 2014-10-28T22:35:41Z

We don't really have any concept of "exclusive access to a stream." If you have a stream object, you can read from it; this is similar to a file handle object. it would be possible to introduce a library or subclass that adds this concept, but I would prefer to let libraries prove the worth of that idea and if we see everyone building such tooling them consider standardizing it.

.cancel() is indeed a good reason to keep ReadableStream methods available.

That said, this kind of situation where a "C++ reader" and a "JS reader" might share a stream seems very reminiscent of the off-main-thread piping question, whatwg/streams#97. In fact you can view it as a subset of that if you define .json() using a concat-stream equivalent:

Response.prototype.json = function () {
  var concatenator = new ConcatStream();
  this.body.pipeTo(concatenator);
  return concatenator.asArrayBuffer.then(convertToString).then(JSON.parse);
};

(npm concat-stream is for Node streams and uses a callback; our hypothetical version uses a promise-returning .asArrayBuffer property.)

I will open a new issue on whatwg/streams and consider if maybe we want to lock the stream during piping.

tyoshino · 2014-10-29T02:14:18Z

We don't really have any concept of "exclusive access to a stream."

By "exclusive", I meant exclusive between arrayBuffer(), blob(), ... text() and direct access on the ReadableStream interface. I didn't mean exclusiveness between consumers who see the ReadableStream interface.

My idea in #452 (comment) realizes this by hiding the stream object itself when json(), etc. is called. The script can never touch any method of the ReadableStream interface. We can read from it only by waiting for the promise returned by json(), etc. which he/she called gets fulfilled.

.cancel() is ...

OK

... new issue ...

Good

yutakahirano · 2014-10-29T03:29:14Z

It seems that whatwg/streams#241 allows "read-and-then-pipeTo", but you don't want to allow "read-and-then-call-json" in this issue, right?
The code in #452 (comment) allows that, I think.

With stream() method, we can solve the problem, though.

domenic · 2014-10-29T03:50:59Z

We definitely want to allow read and then call json(), e.g. for a file format that has descriptive headers then JSON content

tyoshino · 2014-10-29T03:59:24Z

We definitely want to allow read and then call json(), e.g. for a file format that has descriptive headers then JSON content

Yeah, it's useful. But now there's no way to tell the stream how big the header is. Using ReadableByteStream.prototype.readInto()? Then, don't we want to provide json(num_bytes) than just jsonUntilEndOfStream() which we have now? Or, readOneJsonThenStopReading()?

yutakahirano · 2014-10-29T04:02:05Z

We definitely want to allow read and then call json(), e.g. for a file format that has descriptive headers then JSON content

Oh, sorry, I mistook your intention, but it is a bit different from what I said at #452 (comment) .

And for your use case, we need read(size) which we don't have now.

domenic · 2014-10-29T04:24:55Z

You would use readInto, yes. Not sure about the utility of json(num_bytes). In my view json() is just a stopgap until we have real streams, and/or a convenience for the 80% case.

yutakahirano · 2014-10-29T07:29:39Z

What is "readInto"? Is it part of the Streams API?

tyoshino · 2015-01-30T07:13:24Z

I got some offline comment about politically incorrectness of naming. I've renamed it to "body passed".

tyoshino · 2015-01-30T07:30:41Z

Copied #452 (comment) to
https://github.com/tyoshino/streams_integration/blob/master/FetchBodyPrecondition.md and added two more ideas (A)' and (C). Domenic, which is depicting your idea?

tyoshino · 2015-01-30T16:08:46Z

Sorry. (A)' was not depicting Domenic's #452 (comment). Added (A)'' which should be depicting it more correctly.

tyoshino · 2015-01-30T16:16:01Z

BTW, Yutaka questioned what we should do with the ReadableStream when e.g. .text() fails due to invalid UTF-8 data. We came up with the following options:

stays "readable" with unconsumed data (or stays "closed" if the UTF-8 decoder has already drained all bytes)
becomes "closed" after draining unconsumed data
becomes "errored"

domenic · 2015-01-30T20:49:14Z

I'd like to confirm if your opinion is the same as Anne regarding the following question:

fetch(req), new Request(req) and cache.put(req, res) fails if any of fetch(req), new Request(req) and cache.put(req, res) has happened, even after the operation is finished.

I agree with this.

Domenic, which is depicting your idea?

(A)'' depicts my idea.

BTW, Yutaka questioned what we should do with the ReadableStream when e.g. .text() fails due to invalid UTF-8 data. We came up with the following options:

I think 1 or 2 are reasonable and I don't have a strong preference between them. 1 would be text() doing nothing, and 2 would be text() calling this.body.cancel(new TypeError("couldn't decode chunk!")). Maybe since we already think of text() as saying "I want all of the remaining data", 2 is best.

annevk · 2015-01-31T08:00:22Z

utf-8 would not suddenly fail to decode if you hit a trailing byte. You would just get a U+FFFD. I don't think we want to use a fatal decoder.

tyoshino · 2015-01-31T08:37:28Z

OK. So, text() uses a replacement mode decoder, so it doesn't fail halfway. In json(), we may detect a fatal error in the middle of the data. Is such a case, maybe we could call cancel() as Domenic suggested. Similar handling for formData(), too?

annevk · 2015-01-31T09:49:21Z

I'm not sure how multipart/form-data is normally processed. I guess we don't parse it ourselves normally so making it a fatal error is fine if we keep doing that consistently.

yutakahirano · 2015-02-03T05:49:42Z

Just a nitpick, but you don't have to have an entry for fetch(req) in https://github.com/tyoshino/streams_integration/blob/master/FetchBodyPrecondition.md. It is included in new Request(req) case (at least in the current spec, and it seems nobody in this thread wants to change that).

tyoshino · 2015-02-03T08:25:58Z

Yutaka: Thanks. Done tyoshino/streams_integration@723bb07

tyoshino · 2015-02-03T10:46:54Z

Thanks, Anne, Domenic. Let's have json(), etc. cancel() the stream on any fatal error.

Re: Domenic (#452 (comment)),

OK. Let's proceed with that plan. I've moved (A)'' to the top in https://github.com/tyoshino/streams_integration/blob/master/FetchBodyPrecondition.md.

annevk · 2015-02-03T11:23:41Z

Except text(), right? That is, it should not cancel() on incorrect bytes.

tyoshino · 2015-02-03T11:26:52Z

Yes. As you said, text() just replaces incorrect bytes, so it never fails and therefore is never required to do cancel().

tyoshino · 2015-02-03T11:29:04Z

s/never fails/never fails as long as body doesn't become `"errored"/

yutakahirano · 2015-02-04T08:56:04Z

Thank you @tyoshino and others, I will update the draft.

wanderview · 2015-02-05T14:50:53Z

What is the type of the "chunk" provided by Fetch body ReadableStream objects? Is it an ArrayBuffer? The streams spec says a chunk can be of any type, but we should probably define what Fetch body streams return explicitly. (Sorry if this is already defined and I missed it. Its a bit hard to follow the overall effort.)

yutakahirano · 2015-02-05T15:07:55Z

Yes, read() returns an ArrayBuffer. I will describe that in the draft.

Used 'passed flag' instead of 'used flag' as discussed at w3c/ServiceWorker#452.

yutakahirano · 2015-03-26T07:17:21Z

Hi!

Sorry that it took so long, but I updated the draft.
Please let me know if it has any errors regarding bodyUsed, passed flag, locked flag, etc.

annevk · 2015-04-05T11:10:57Z

I suggest any further review is done in that repository until the merge happens (at which point we should switch to https://github.com/whatwg/fetch I think for anything new).

xgqfrms-GitHub · 2017-11-08T08:56:43Z

TypeError: body stream already read at FetchDatas.fetch.then ???

what.s wrong with this? Anybody can help?

KenjiBaheux added enhancement fetch labels Sep 26, 2014

tyoshino mentioned this issue Oct 28, 2014

Ensure the JS thread doesn't have to get hit when piping two streams to each other whatwg/streams#97

Closed

domenic mentioned this issue Oct 28, 2014

Should we "lock" readable streams while piping? whatwg/streams#241

Closed

yutakahirano mentioned this issue Feb 5, 2015

Specify stream chunk type. yutakahirano/fetch-with-streams#19

Closed

yutakahirano added a commit to yutakahirano/fetch-with-streams that referenced this issue Feb 12, 2015

Specify Body locking with ExclusiveStreamReader.

49ff0c8

Used 'passed flag' instead of 'used flag' as discussed at w3c/ServiceWorker#452.

yutakahirano mentioned this issue Feb 12, 2015

Specify Body locking with ExclusiveStreamReader. yutakahirano/fetch-with-streams#21

Merged

yutakahirano mentioned this issue Mar 13, 2015

Getting errored/closed reader, Auto-releasing reader, Forcible cancel() on a stream whatwg/streams#297

Closed

yutakahirano added a commit to yutakahirano/fetch-with-streams that referenced this issue Mar 25, 2015

Specify Body locking with ExclusiveStreamReader.

af03f8a

Used 'passed flag' instead of 'used flag' as discussed at w3c/ServiceWorker#452.

annevk closed this as completed Apr 5, 2015

yutakahirano mentioned this issue Apr 6, 2015

Preparation for merge yutakahirano/fetch-with-streams#25

Closed

yutakahirano mentioned this issue May 7, 2015

problem with .text() drain behavior and .bodyUsed yutakahirano/fetch-with-streams#37

Closed

yutakahirano mentioned this issue Jul 27, 2015

Use IsDisturbed(ReadableStream) yutakahirano/fetch-with-streams#48

Merged

annevk mentioned this issue Feb 1, 2021

Introspection: disturbed or locked whatwg/streams#1105

Open

Backpressure on fetch integrated with Streams #452

Backpressure on fetch integrated with Streams #452

Comments

yutakahirano commented Sep 10, 2014

yutakahirano commented Sep 10, 2014

yutakahirano commented Sep 10, 2014

annevk commented Sep 10, 2014

domenic commented Sep 14, 2014

annevk commented Sep 15, 2014

yutakahirano commented Sep 18, 2014

annevk commented Sep 18, 2014

domenic commented Sep 18, 2014

tyoshino commented Sep 19, 2014

tyoshino commented Sep 19, 2014

annevk commented Sep 19, 2014

tyoshino commented Sep 19, 2014

domenic commented Sep 19, 2014

yutakahirano commented Oct 21, 2014

domenic commented Oct 21, 2014

tyoshino commented Oct 27, 2014

domenic commented Oct 27, 2014

tyoshino commented Oct 28, 2014

domenic commented Oct 28, 2014

tyoshino commented Oct 29, 2014

yutakahirano commented Oct 29, 2014

domenic commented Oct 29, 2014

tyoshino commented Oct 29, 2014

yutakahirano commented Oct 29, 2014

domenic commented Oct 29, 2014

yutakahirano commented Oct 29, 2014

tyoshino commented Jan 30, 2015

tyoshino commented Jan 30, 2015

tyoshino commented Jan 30, 2015

tyoshino commented Jan 30, 2015

domenic commented Jan 30, 2015

annevk commented Jan 31, 2015

tyoshino commented Jan 31, 2015

annevk commented Jan 31, 2015

yutakahirano commented Feb 3, 2015

tyoshino commented Feb 3, 2015

tyoshino commented Feb 3, 2015

annevk commented Feb 3, 2015

tyoshino commented Feb 3, 2015

tyoshino commented Feb 3, 2015

yutakahirano commented Feb 4, 2015

wanderview commented Feb 5, 2015

yutakahirano commented Feb 5, 2015

yutakahirano commented Mar 26, 2015

annevk commented Apr 5, 2015

xgqfrms-GitHub commented Nov 8, 2017

TypeError: body stream already read at FetchDatas.fetch.then ???