Clarification of content encoding with multiple values #1251

bjmi · 2020-09-25T12:28:43Z

Affects Version(s): 2.2.5.RELEASE

Can you please clarify why a colon : was used as separator if multiple content-encoding values are processed.
This happens if e.g. a message body is compressed with GZipPostProcessor. A previous UTF-8 value becomes gzip:UTF-8 when Spring AMQP was used.

Where does this : originates from? Are there any specification documents for that?
Following references advocate a comma ,

https://www.rabbitmq.com/publishers.html / https://www.rabbitmq.com/consumers.html

For example, messages with JSON payload should use application/json. If the payload is compressed with the LZ77 (GZip) algorithm, its content encoding should be gzip.

Multiple encodings can be specified by separating them with commas.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding

Content-Encoding: gzip, identity

(Maybe also interesting https://docs.oasis-open.org/amqp/core/v1.0/amqp-core-messaging-v1.0.html)

Implementations SHOULD NOT specify multiple content-encoding values except as to be compatible with messages originally sent with other protocols, e.g. HTTP or SMTP.

Our existing AMQP implementation uses comma , as separator and this breaks interoperability with newly adopted spring applications that uses Spring AMQP in conjunction with compressed message bodies. What would the right fix for that?
It would be really useful if Spring PostProcessors could be configured with right separator.
Additionally if whitespaces (gzip : UTF-8 or gzip, UTF-8) are present then current implementation is broken too.

Thanks in advance for your thoughts.

Affected classes

AbstractCompressingPostProcessor
AbstractDecompressingPostProcessor
DelegatingDecompressingPostProcessor

The text was updated successfully, but these errors were encountered:

garyrussell · 2020-09-25T14:18:03Z

It's a bug, but we'll have to make it a property to avoid a breaking change. We can make comma the default in 2.3.

bjmi · 2020-09-25T14:58:41Z

Thank you for your fast response.

One thing what was also observed. Some applications leave contentEncoding empty for "raw" data and only fill contentEncoding property if e.g. compression was applied. Media types can contain used charset.

Uncompressed

contentType: application/json    or    application/json; charset=utf-8
contentEncoding: UTF-8

vs.

contentType: application/json    or    application/json; charset=utf-8
contentEncoding:

Compressed with gzip

contentType: application/json    or    application/json; charset=utf-8
contentEncoding: gzip, UTF-8

vs.

contentType: application/json    or    application/json; charset=utf-8
contentEncoding: gzip

Resolves spring-projects#1251 Delimiter should be a comma and whitespace trimmeed. Handle both delimiters in decompressors and add property for backwards compatibility. **I will backport to 2.2.x with default `:`**

garyrussell · 2020-09-25T15:15:37Z

contentType: application/json    or    application/json; charset=utf-8
contentEncoding:

Are you saying contentEncoding is present, but with an empty String? We already check for null, but I will change it to handle an empty String too.

bjmi · 2020-09-25T16:44:50Z

I meant contentEncoding is null.

garyrussell · 2020-09-25T17:04:25Z

We already handled that case, but I have added code to treat an empty String as null as well.

Resolves #1251 Delimiter should be a comma and whitespace trimmeed. Handle both delimiters in decompressors and add property for backwards compatibility. **I will backport to 2.2.x with default `:`** * Do not add a delimiter if the original encoding is an empty String.

bjmi · 2020-09-26T13:59:06Z

I wanted to suggest another handling of contentEncoding property but you were too fast :)

In case of "plain" data like JSON or XML don't set contentEncoding property at all. Jackson parser can determine (com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper#detectEncoding) the right encoding of a received byte[] which contains JSON or XML (e.g. with BOM if not UTF-8 or defined other conventions). The charset could be appended to the contentType: application/json; charset=utf-8 header additionally to specify used charset explicitly.
I.e. following line should just set null or do nothing.

spring-amqp/spring-amqp/src/main/java/org/springframework/amqp/support/converter/AbstractJackson2MessageConverter.java

Line 387 in 4f76b2b

messageProperties.setContentEncoding(getDefaultCharset());

And let message post processors like GZipPostProcessor use field contentEncoding to declare that the content was converted and have to be be converted back on the client side.

Maybe it is better to open another issue for that?

garyrussell · 2020-09-28T14:03:31Z

There is no guarantee that the consumer uses Jackson to decode the content.

We can consider moving the charset to the contentType; please open a new issue for that.

Resolves spring-projects#1420 - detect and use `charset` in `contentType` when preseny - allow configuration of the `MimeType` to use, which can include a `charset` parameter **cherry-pick to main - will require what's new fix**

Resolves spring-projects#1420 - detect and use `charset` in `contentType` when present - allow Jackson to determine the decode `charset` via `ByteSourceJsonBootstrapper.detectEncoding()` - allow configuration of the `MimeType` to use, which can include a `charset` parameter **cherry-pick to main - will require what's new fix**

Resolves #1420 - detect and use `charset` in `contentType` when present - allow Jackson to determine the decode `charset` via `ByteSourceJsonBootstrapper.detectEncoding()` - allow configuration of the `MimeType` to use, which can include a `charset` parameter **cherry-pick to main - will require what's new fix** * Fix typo in doc.

Resolves #1420 - detect and use `charset` in `contentType` when present - allow Jackson to determine the decode `charset` via `ByteSourceJsonBootstrapper.detectEncoding()` - allow configuration of the `MimeType` to use, which can include a `charset` parameter **cherry-pick to main - will require what's new fix** * Fix typo in doc. # Conflicts: # src/reference/asciidoc/whats-new.adoc

garyrussell self-assigned this Sep 25, 2020

garyrussell added the backport:2.2.x (obsolete) label Sep 25, 2020

garyrussell added this to the 2.3.RC1 milestone Sep 25, 2020

garyrussell mentioned this issue Sep 25, 2020

Fix Compressed contentEncoding Delimiter #1252

Merged

artembilan closed this as completed in #1252 Sep 25, 2020

garyrussell added the type: bug label Dec 18, 2020

bjmi mentioned this issue Feb 1, 2022

Determine charset from contentType if any #1420

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification of content encoding with multiple values #1251

Clarification of content encoding with multiple values #1251

bjmi commented Sep 25, 2020 •

edited

Loading

garyrussell commented Sep 25, 2020

bjmi commented Sep 25, 2020

garyrussell commented Sep 25, 2020

bjmi commented Sep 25, 2020

garyrussell commented Sep 25, 2020

bjmi commented Sep 26, 2020

garyrussell commented Sep 28, 2020

Clarification of content encoding with multiple values #1251

Clarification of content encoding with multiple values #1251

Comments

bjmi commented Sep 25, 2020 • edited Loading

Affects Version(s): 2.2.5.RELEASE

Affected classes

garyrussell commented Sep 25, 2020

bjmi commented Sep 25, 2020

Uncompressed

Compressed with gzip

garyrussell commented Sep 25, 2020

bjmi commented Sep 25, 2020

garyrussell commented Sep 25, 2020

bjmi commented Sep 26, 2020

garyrussell commented Sep 28, 2020

bjmi commented Sep 25, 2020 •

edited

Loading