Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom tee function #401

Open
martinthomson opened this issue Oct 26, 2015 · 22 comments
Open

Custom tee function #401

martinthomson opened this issue Oct 26, 2015 · 22 comments

Comments

@martinthomson
Copy link

At TPAC, we discussed the use of streams in fetch. The problem of having to rely on the default implementation of tee() is that this introduces the potential need for an infinite buffer. If the JS were able to provide a tee implementation that didn't depend on creating a clone, that would be significantly more performant.

@annevk
Copy link
Member

annevk commented Oct 26, 2015

(The only potential problem in using this for fetch would be that we might have to provide a consistent body across redirects. That hasn't yet been discussed with the security teams. But we could nevertheless use it for clone().)

@martinthomson
Copy link
Author

Having the custom tee() invoked on clone(), but not on the clones used for redirect might be necessary, yes. The important point that @annevk raises is that we might get the performance gain in the places where it is safe (calling clone() on a Request or Response) but we have a separate choice to make about how to deal with the internal cloning needs.

@yutakahirano
Copy link
Member

#271

@wanderview
Copy link
Member

Note, the current tee() implementation does not actually duplicate the contents of the buffer chunks. We're really just talking about the memory savings of the array holding the pointers to the buffer chunks. This seems like pretty minimal savings to me for such added complexity to the API.

@martinthomson
Copy link
Author

@wanderview, I assume that you refer to the Firefox implementation? What happens when you have vastly different consumption rates for the two streams? If one reader is 1 gigabyte ahead of the other, do you exert back pressure? The view was that the actual chunks for the stream could be garbage collected once both teed streams have read them.

Oh, and the requirement for SameObject on the chunks would need to be relaxed to accomplish this. That assumes of course that the SameObject requirement wasn't needed for a reason that we couldn't think of during the meeting.

@tyoshino
Copy link
Member

This complicates pipeTo optimization. When an IdentityTransformStreams (Pipe) are involved, we need to propagate the tee() signal to ask the first data producer in the pipeTo chain, and have all the members in the chain convey data flow for both slow and fast consumer. Needs careful investigation.

@wanderview
Copy link
Member

In my previous comment I meant the current tee() spec.

From you original post in this thread:

If the JS were able to provide a tee implementation that didn't depend on creating a clone

Which is already stated in the stream spec. See:

Note that the chunks seen in each branch will be the same object. If the chunks are not immutable, this could allow interference between the two branches. (Let us know if you think we should add an option to tee that creates structured clones of the chunks for each branch.)

The cloning issue is different from what you mention here, though:

What happens when you have vastly different consumption rates for the two streams? If one reader is 1 gigabyte ahead of the other, do you exert back pressure?

This is really the crux of the problem and AFAICT not optimizable in the general js case.

@martinthomson can you explain under what scenario this can actually be optimized not to consume memory equal to the difference in consumption? Particularly for a js defined stream (which you seemed focused on in your original post).

The only optimization that seems possible is if its a dom produced stream reading from a file. But I don't see how js defined custom tee() would help with this.

the requirement for SameObject on the chunks would need to be relaxed to accomplish this

I'm really confused what you want. The initial comment was asking for no clone, but this seems to be suggesting you do want a clone.

I do agree allowing the tee's to be different objects would help with the from-disk-optimization, at the expense of penalizing all other tee's. Perhaps some kind of copy-on-write chunk mechanism could support both.

@martinthomson
Copy link
Author

If one reader is 1 gigabyte ahead of the other, do you exert back pressure?

This is really the crux of the problem and AFAICT not optimizable in the general js case.

The point is that it is not generally optimizable, but that there were cases where massive state commitment could be avoided. If, for example, you could quite reasonably produce the same stream again, then you could implement tee() by producing two different streams that happen to have the same content.

@wanderview
Copy link
Member

Ok. I'll wait to see a more concrete proposal. I'm happy with what the current stream spec says for tee() for a v1 implementation.

@martinthomson
Copy link
Author

Understood. I think that the general view was that tee() would be sufficient for the needs of fetch(), just not necessarily perfectly optimal.

@tyoshino
Copy link
Member

Hmm, I see. When saving a fetch result in a SW into cache and responding to the controlled page, the source is network which is not replayable. It's possible that we want to issue a POST with a file on a disk as body, but I don't quickly come up with any situation where we tee() the data into two consumer in a SW. Issuing two or more requests to multiple destinations to distribute data for reliability (disaster), for example?

Then, ideally all the identity transforms and communication between SW and the page should be optimized (skipped), and each of the requests would read from the file individually.

@wanderview
Copy link
Member

I think it would be nice to remove the same object requirement for tee chunks. We probably won't be able to relax that in the future once code starts depending on shared state between the streams.

Removing same object on the chunks creates a memory penalty with the simple implementation, but it seems like copy-on-write is a simple optimization that could be applied here. (I say "simple", but I'm not a JS engine guy...)

@martinthomson
Copy link
Author

@wanderview, if we relax the SameObject requirement that doesn't mean that it has to be a different object. If it makes sense to return the same object, then you can.

@wanderview
Copy link
Member

I think we should spec same object or different object. Otherwise we will end up with compat issues between different stream implementations.

@martinthomson
Copy link
Author

As long as it is same value, why does it matter if the objects are same or different?

@wanderview
Copy link
Member

Because sites will do things like chunk.processed = true to communicate between the two sides of the tee() if we return same object.

If we don't spec same object or different object, a browser returning same object may still lead to sites doing this. Which then breaks the site in other browsers. I think its better to be explicit.

@martinthomson
Copy link
Author

Ugh. Yes, I can accept that reasoning. Of course, we can't make the same guarantee for custom tee() functions, but the default behaviour will be the important thing.

@yutakahirano
Copy link
Member

Can we throttle fetch stream-reading from a request by N bytes in the fetch algorithm until the http status code arrives (N will be given via the RequestInit parameter)? Then we don't have to modify tee function, I think.

@annevk
Copy link
Member

annevk commented Nov 2, 2015

Pretty sure that in the normal case we will not get the status code until the server receives all bytes, so I don't think that works, unless I'm missing something.

@yutakahirano
Copy link
Member

@annevk

fetch(new Request(url, {body: rs}))
  • if rs is a user-constructed stream:
    A user might want to limit the number of in-flight bytes until the response status arrives (especially for infinite streams).
  • if rs is replayable (e.g. made from a blob):
    An implementer may want to skip the default implementation and use an efficient algorithm instead.

Sorry I mixed these two issues.

For the latter, I would set shouldClone parameter for tee(). If ReadableByteStream (or ReadableStream for bytes) allows an implementer to coalesce buffers, the implementer can optimize the operation without an observable behavior change, I think.

@annevk
Copy link
Member

annevk commented Nov 2, 2015

A user might want to limit the number of in-flight bytes until the response status arrives (especially for infinite streams).

How does that work in HTTP? That requires Expect: 100-continue, but even then I don't think we are guaranteed to not get a redirect?

@yutakahirano
Copy link
Member

If limiting the number of bytes is impossible, it looks dangerous to issue a fetch request with a stream body and "follow" redirect mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants