Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate multipart exceptions and accept custom charsets #56

Merged
merged 4 commits into from
May 28, 2023

Conversation

vincentjames501
Copy link

  • In the previous implementation, when uploading a large file while the remote end hangs up it would throw an unhandled exception generally in the form "Read end dead"
  • In the previous implementation, when uploading a file using an InputStream for example and the underlying stream throws an exception, these exceptions would not be propagated to the HttpClient/HttpRequest and would often result in incomplete payloads arriving to the server.
  • Allow setting string charset types correctly where previously it would always use UTF-8 when copying to the underlying PipedOutputStream.

…riate

- In the previous implementation, when uploading a large file while the remote end hangs up it
would throw an unhandled exception generally in the form "Read end dead"
- In the previous implementation, when uploading a file using an InputStream for example and
the underlying stream throws an exception, these exceptions would not be propagated to the
HttpClient/HttpRequest and would often result in incomplete payloads arriving to the server.
- Allow setting string charset types correctly where previously it would always use UTF-8
when copying to the underlying PipedOutputStream.
@gnarroway
Copy link
Owner

Thanks will take a look. Does this have any impact on #55 ?

@vincentjames501
Copy link
Author

Thanks will take a look. Does this have any impact on #55 ?

@gnarroway , it doesn't attempt to fix #55 . We likely want to loop through the parts and do our best to determine the content length of each part similar to how OkHttp does when determining the content length header. We can only report the content length if we can accurately determine the size of each part so it's fine if it's each part is a File/String/ByteArray/etc.

https://github.com/square/okhttp/blob/f9901627431be098ad73abd725fbb3738747461c/okhttp/src/jvmMain/kotlin/okhttp3/MultipartBody.kt#L109
https://github.com/square/okhttp/blob/f9901627431be098ad73abd725fbb3738747461c/okhttp/src/jvmTest/java/okhttp3/MultipartBodyTest.java#L27

…the multipart body (where appropriate)

- This introduces a new protocol called MultipartParam which can be extended to support additional
content types. By default we're matching essentially what io/input-stream does by default to
minimize any backwards incompatible changes here
- When we know the content-length of the multipart body, supply it to the HttpClient
(-> req
(dissoc :multipart)
(assoc :body (multipart/body multipart b))
(assoc :body
(if (= -1 content-length)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are unable to determine the content length, leave the body as an input stream and use transfer encoding chunked. Otherwise, specify the content length so the appropriate content-length header is used.

content-type
(str (:mime-type content-type)
(when-let [charset (:charset content-type)]
(str "; charset=" charset))))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an issue today where if you supply a String, we essentially only ever allow UTF-8. This allows the content-type param to be either a string or map which can describe the mime type and charset. I could see an argument for this being another key but this approach more closely aligns with how libraries like Apache HttpClient represent ContentType.

@@ -55,12 +84,94 @@
(take 30)
(apply str "hatoBoundary")))

(defn body
"Returns an InputStream from the multipart inputs.
(defprotocol MultipartParam
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following protocol defines the possible multiple param content types (can be easily extended). We need to be able to coerce the param into an input stream and determine it's length (if possible). For types that have potentially unknown lengths (such as an InputStream), a client can set it by supplying :content-length in the multipart param map.

(extend-protocol MultipartParam
String
(input-stream [^String content opts]
(io/input-stream (.getBytes content (charset-encoding opts))))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make a best guess to determine the charset for the string so we look at either the :content-type string/map and parse it.

@vincentjames501
Copy link
Author

@gnarroway , I took a crack at addressing #55 . Appears to be working for me but could certainly use some smoke testing.
CC @p-himik

@p-himik
Copy link

p-himik commented May 20, 2023

After a quick test, it seems to work just fine on my end, including the fix for #55.

@gnarroway gnarroway merged commit f8dd04f into gnarroway:master May 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants