Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification of content encoding with multiple values #1251

Closed
bjmi opened this issue Sep 25, 2020 · 7 comments · Fixed by #1252
Closed

Clarification of content encoding with multiple values #1251

bjmi opened this issue Sep 25, 2020 · 7 comments · Fixed by #1252

Comments

@bjmi
Copy link
Contributor

bjmi commented Sep 25, 2020

Affects Version(s): 2.2.5.RELEASE

Can you please clarify why a colon : was used as separator if multiple content-encoding values are processed.
This happens if e.g. a message body is compressed with GZipPostProcessor. A previous UTF-8 value becomes gzip:UTF-8 when Spring AMQP was used.

Where does this : originates from? Are there any specification documents for that?
Following references advocate a comma ,

For example, messages with JSON payload should use application/json. If the payload is compressed with the LZ77 (GZip) algorithm, its content encoding should be gzip.

Multiple encodings can be specified by separating them with commas.

Content-Encoding: gzip, identity

Implementations SHOULD NOT specify multiple content-encoding values except as to be compatible with messages originally sent with other protocols, e.g. HTTP or SMTP.

Our existing AMQP implementation uses comma , as separator and this breaks interoperability with newly adopted spring applications that uses Spring AMQP in conjunction with compressed message bodies. What would the right fix for that?
It would be really useful if Spring PostProcessors could be configured with right separator.
Additionally if whitespaces (gzip : UTF-8 or gzip, UTF-8) are present then current implementation is broken too.

Thanks in advance for your thoughts.

Affected classes

  • AbstractCompressingPostProcessor
  • AbstractDecompressingPostProcessor
  • DelegatingDecompressingPostProcessor
@garyrussell
Copy link
Contributor

It's a bug, but we'll have to make it a property to avoid a breaking change. We can make comma the default in 2.3.

@garyrussell garyrussell self-assigned this Sep 25, 2020
@garyrussell garyrussell added this to the 2.3.RC1 milestone Sep 25, 2020
@bjmi
Copy link
Contributor Author

bjmi commented Sep 25, 2020

Thank you for your fast response.

One thing what was also observed. Some applications leave contentEncoding empty for "raw" data and only fill contentEncoding property if e.g. compression was applied. Media types can contain used charset.

Uncompressed

contentType: application/json    or    application/json; charset=utf-8
contentEncoding: UTF-8

vs.

contentType: application/json    or    application/json; charset=utf-8
contentEncoding: 

Compressed with gzip

contentType: application/json    or    application/json; charset=utf-8
contentEncoding: gzip, UTF-8

vs.

contentType: application/json    or    application/json; charset=utf-8
contentEncoding: gzip

garyrussell added a commit to garyrussell/spring-amqp that referenced this issue Sep 25, 2020
Resolves spring-projects#1251

Delimiter should be a comma and whitespace trimmeed.

Handle both delimiters in decompressors and add property for backwards compatibility.

**I will backport to 2.2.x with default `:`**
@garyrussell
Copy link
Contributor

contentType: application/json    or    application/json; charset=utf-8
contentEncoding: 

Are you saying contentEncoding is present, but with an empty String? We already check for null, but I will change it to handle an empty String too.

@bjmi
Copy link
Contributor Author

bjmi commented Sep 25, 2020

I meant contentEncoding is null.

@garyrussell
Copy link
Contributor

We already handled that case, but I have added code to treat an empty String as null as well.

artembilan pushed a commit that referenced this issue Sep 25, 2020
Resolves #1251

Delimiter should be a comma and whitespace trimmeed.

Handle both delimiters in decompressors and add property for backwards compatibility.

**I will backport to 2.2.x with default `:`**

* Do not add a delimiter if the original encoding is an empty String.
garyrussell added a commit that referenced this issue Sep 25, 2020
Resolves #1251

Delimiter should be a comma and whitespace trimmeed.

Handle both delimiters in decompressors and add property for backwards compatibility.

**I will backport to 2.2.x with default `:`**

* Do not add a delimiter if the original encoding is an empty String.
@bjmi
Copy link
Contributor Author

bjmi commented Sep 26, 2020

I wanted to suggest another handling of contentEncoding property but you were too fast :)

In case of "plain" data like JSON or XML don't set contentEncoding property at all. Jackson parser can determine (com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper#detectEncoding) the right encoding of a received byte[] which contains JSON or XML (e.g. with BOM if not UTF-8 or defined other conventions). The charset could be appended to the contentType: application/json; charset=utf-8 header additionally to specify used charset explicitly.
I.e. following line should just set null or do nothing.

And let message post processors like GZipPostProcessor use field contentEncoding to declare that the content was converted and have to be be converted back on the client side.

Maybe it is better to open another issue for that?

@garyrussell
Copy link
Contributor

There is no guarantee that the consumer uses Jackson to decode the content.

We can consider moving the charset to the contentType; please open a new issue for that.

garyrussell added a commit to garyrussell/spring-amqp that referenced this issue Mar 9, 2022
Resolves spring-projects#1420

- detect and use `charset` in `contentType` when preseny
- allow configuration of the `MimeType` to use, which can include a `charset` parameter

**cherry-pick to main - will require what's new fix**
garyrussell added a commit to garyrussell/spring-amqp that referenced this issue Mar 9, 2022
Resolves spring-projects#1420

- detect and use `charset` in `contentType` when present
- allow Jackson to determine the decode `charset` via `ByteSourceJsonBootstrapper.detectEncoding()`
- allow configuration of the `MimeType` to use, which can include a `charset` parameter

**cherry-pick to main - will require what's new fix**
garyrussell added a commit to garyrussell/spring-amqp that referenced this issue Mar 9, 2022
Resolves spring-projects#1420

- detect and use `charset` in `contentType` when present
- allow Jackson to determine the decode `charset` via `ByteSourceJsonBootstrapper.detectEncoding()`
- allow configuration of the `MimeType` to use, which can include a `charset` parameter

**cherry-pick to main - will require what's new fix**
artembilan pushed a commit that referenced this issue Mar 9, 2022
Resolves #1420

- detect and use `charset` in `contentType` when present
- allow Jackson to determine the decode `charset` via `ByteSourceJsonBootstrapper.detectEncoding()`
- allow configuration of the `MimeType` to use, which can include a `charset` parameter

**cherry-pick to main - will require what's new fix**

* Fix typo in doc.
artembilan pushed a commit that referenced this issue Mar 9, 2022
Resolves #1420

- detect and use `charset` in `contentType` when present
- allow Jackson to determine the decode `charset` via `ByteSourceJsonBootstrapper.detectEncoding()`
- allow configuration of the `MimeType` to use, which can include a `charset` parameter

**cherry-pick to main - will require what's new fix**

* Fix typo in doc.
# Conflicts:
#	src/reference/asciidoc/whats-new.adoc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants