Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming uploads #131

Closed
edofic opened this issue Nov 16, 2014 · 14 comments
Closed

Streaming uploads #131

edofic opened this issue Nov 16, 2014 · 14 comments

Comments

@edofic
Copy link
Contributor

edofic commented Nov 16, 2014

I believe it is currently impossible to handle streaming uploads with scotty.

Possible use case is to on the fly process data being uploaded while the upload process itself might be long and slow and also produce a big file.

This could with WAI directly by using requestBody that returns chunks of the body as they arive. But scotty consumes whole body before processing the request.

Changing this would also allow for support of responseRaw (see Util.hs) for direct work with the socket.

However the change seems non trivial to me. I see two possible solutions

  • using unsafeInterleaveIO for actions that consume body before handling the action. This leaves us with the same interface but makes it, well, unsafe.
  • changing Web.Scotty.Internal.Types.ActionEnv to hold IO actions instead of values. We can then wrap those actions with memoizing combinators. This is safer but breaks existing code.

I can implement either of these two but would like some feedback first.

@sol
Copy link
Contributor

sol commented Nov 17, 2014

Is the reason for reading the whole body before processing the request because we provide access to urlencoded parameters from HTML forms through Web.Scotty.params?

If yes, I think whether this is necessary can be decide by just looking at the headers. In other words can we optionally deffer body consumption and provide access to the body in the same way as WAI does? (read: if the Content-Type is not application/x-www-form-urlencoded defer body reading)

@xich
Copy link
Member

xich commented Nov 18, 2014

Yes... looking at this, it all needs to be rewritten. I never liked the idea of reading the whole body upfront, it was just a quick hack to get params working for forms.

I think params should only try to look in the body if the Content-Type header indicates something useful is there, and ideally should only call requestBody enough to find the param (and not necessarily eagerly read the whole body).

This probably means removing getBody from ActionEnv, changing body to strictly read the body (essentially what mkEnv does now), then change params to check for application/x-www-form-urlencoded and if so, call the new body to read the body and parse any params out, adding them to the getParams list. (This means ActionEnv will have to be in state instead of a reader, to avoid doing this over and over.)

Patches welcome... if not, I'll probably get some time to spend on this in about a month. :-P

@xich
Copy link
Member

xich commented Nov 18, 2014

Er, getBody will probably have to stay (but change types) so we can keep the body we've read so far around so we can do things like:

foo <- param "foo"
if foo then B.writeFile "bar" =<< body else return ()

(That is, we don't want to throw away chunks that param or body reads... if someone is doing streaming uploads they can use requestBody directly and decide whether to throw away the chunks.)

@edofic
Copy link
Contributor Author

edofic commented Nov 18, 2014

@xich I have a version of this almost implemented.

I implemented the check and body parsing when need. I kept body and added bodyReader that returns an IO ByteString that returns chunks. The problem is that the two are inherently incompatible. body consumes all the chunks so there is nothing left for the bodyReader. On the flip side we mustn't keep the chunks obtained from the bodyReader. My use case for streaming uploads is doing some logic and possibly forwarding the body to another service. And bodies might be few gigabytes in size. Add in some concurrent uploads and you just ran out of memory. This means that combining body and bodyReader in the same action must be unsafe. Sadly I don't see a way to encode this mutual exclusion in the type system.
Perhaps we put bodyReader into separate module hintin that it is a bit unsafe?

Anyway expect my version of this in a day or so.

@edofic
Copy link
Contributor Author

edofic commented Nov 18, 2014

Or I may have misunderstood. Are you suggesting that if somebody calls body and afterwards uses bodyReader the reader should return the already consumed chunks? I believe this is doable and in fact a good idea.

@xich
Copy link
Member

xich commented Nov 18, 2014

No what you describe is roughly what I had in mind (I think). Your bodyReader is just request >>= liftIO . WAI.requestBody, right?

I think the only thing we can do is document the mutual exclusion in the haddocks for body, or possibly deprecate it and rename to strictBody or something, which indicates what it does. The more subtle issue is with param/params. A user of bodyReader might not realize those two look into the body (when certain headers indicate they should).

In short, I think that if param/params/body reads the body, we need to keep it around in the ActionEnv so if we call those functions again we'll get the whole body back. (Hence ActionEnv must be state instead of a reader.) bodyReader on the other hand, uses the above definition, returning the chunk and not storing it anywhere.

@sol
Copy link
Contributor

sol commented Nov 18, 2014

I would assume param/params and bodyReader can live together without breaking any invariants (say if the body is url encoded we read and decode it, but also keep the unencoded body so that we can return it from bodyReader).

I see a conflict between bodyReader, body and jsonData, though.

What we could do is that all of them "consume" the body. Say once you call any of those, the others will return some indication of empty (JSON null, empty string, empty reader). I'm not sure whether I really like this approach, but it would at least be consistent (even though it's a breaking change).

@sol
Copy link
Contributor

sol commented Nov 18, 2014

Maybe an exception in the ActionM monad would be preferable over returning "some indication of empty" (it's a programmer error).

@edofic
Copy link
Contributor Author

edofic commented Nov 18, 2014

@sol
If we memoize body when consuming it via body then bodyReader, body and jsonData can also coexist. jsonData simply uses body and then parses it. This way just body needs to memoize. The trick is to store chunks for later streaming so we don't break bodyReader. Only open problem is what to do if bodyReader was already used and you make a call to body.

@sol
Copy link
Contributor

sol commented Nov 18, 2014

Yes, true. So maybe only raise exceptions if jsonData or body is called after bodyReader.

edofic added a commit to edofic/scotty that referenced this issue Nov 18, 2014
this is relevant to issue scotty-web#131
in current implementation it is still usafe to mix
body/jsonBody and bodyReader
@edofic
Copy link
Contributor Author

edofic commented Nov 18, 2014

I just pushed the initial implementation.
Planned additions

  • memoize internal io action for body
  • store chunks if calling body
  • raise exception when calling body if chunks were consumed via bodyReader

@sol
Copy link
Contributor

sol commented Nov 18, 2014

@edofic Is it actually necessary to "safe chunks"? I would assume that if body is called before bodyReader you have the memoized body, so you can construct a bodyReader from that (which returns a big chunk -- may feel wired, but I'm not aware of any issues).

@edofic
Copy link
Contributor Author

edofic commented Nov 18, 2014

@sol Of course. This is in fact a simpler solution.

edofic added a commit to edofic/scotty that referenced this issue Nov 18, 2014
this makes it safe to call body(or something that uses it) and
then use bodyReader. Calling bodyReader will cause subsequent calls
to body to result in a BodyPartiallyStreamed exception.

still a bit rough around the edges and might require some exception
handling when working with mvars (bracket).

fixes scotty-web#131
@edofic
Copy link
Contributor Author

edofic commented Nov 18, 2014

I implemented safer bodyReader using MVars.
Some exception handling (bracket) should probablly added for additional safety.
I believe this is even thread safe. Correct me if I'm wrong.

The code is a bit hairy though but I find it hard to refactor into smaller nice units.

edofic added a commit to edofic/scotty that referenced this issue Nov 18, 2014
this makes it safe to call body(or something that uses it) and
then use bodyReader. Calling bodyReader will cause subsequent calls
to body to result in a BodyPartiallyStreamed exception.

still a bit rough around the edges and might require some exception
handling when working with mvars (bracket).

fixes scotty-web#131
@xich xich closed this as completed in 0ef25c2 Dec 1, 2014
xich pushed a commit that referenced this issue Sep 18, 2015
this is relevant to issue #131
in current implementation it is still usafe to mix
body/jsonBody and bodyReader
xich pushed a commit that referenced this issue Sep 18, 2015
this makes it safe to call body(or something that uses it) and
then use bodyReader. Calling bodyReader will cause subsequent calls
to body to result in a BodyPartiallyStreamed exception.

still a bit rough around the edges and might require some exception
handling when working with mvars (bracket).

fixes #131
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants