Do we still want to support 4 different content types? #22811

javanna · 2017-01-26T15:27:16Z

As part of #22691 we are removing content type auto-detection for incoming requests, in favour of reading the Content-Type header. Elasticsearch supports 4 different formats: json, yaml, smile and cbor. All 4 are supported with few exceptions: bulk, msearch and friends only support json or smile (actually multi line json or smile composed of one valid object per line, the whole request is never valid json nor smile).

Given that the majority of our users use json, that we barely test the other formats, especially the binary ones, and that handling 4 different formats complicates our code, not to mention in some places we may completely forget about this and just assume that we get json, I wonder if we still want to support all these 4 formats.

I think the outcome of this issue should be either deprecate and then remove support for yaml, smile and cbor, or invest in adding coverage for these formats so that we feel more comfortable about supporting them. All said also applies to both requests and responses.

The text was updated successfully, but these errors were encountered:

karmi · 2017-01-26T15:39:31Z

I vote for trimming down the number of supported formats, with the exception of YAML -- seems like some people do use that, eg. for mappings/etc, where they take an advantage of comment support.

javanna · 2017-01-26T16:02:34Z

@karmi we convert all the mappings to json before storing them in the cluster state, so once converted the comments are gone. They are not returned e.g. by the get mapping api. Is this what you'd expect?

drewr · 2017-01-26T16:12:22Z

I find myself using YAML quite a bit as well. I don't care about how it's serialized internally, the utility is on the user's end. And while I haven't used Smile and CBOR nearly as much, it seems like it would be a shame to limit the options just to clean the code up.

I'm +1000 on getting rid of the silly heuristics (like the ^---\n for YAML) in favor of Content-Type though. Seems like that alone would be the biggest contributor to cleaning up the code.

karmi · 2017-01-26T16:16:15Z

@javanna I think the fact that the payload is converted to JSON after it's received doesn't matter here -- people probably have their configuration in some kind of repository (scrips, configuration management code, application code), and that is the place where they care about the comments (and other things).

Of course, and argument can be always made, that people can use real code for that, Python, Ruby, ..., whatever, where they can have proper comments and other nice things.

(UPDATED)

javanna · 2017-01-26T16:31:50Z

just to clean the code up.

preventing future or fixing existing bugs is another way to say it which makes it sound much better :)

jasontedor · 2017-01-26T16:55:20Z

Our JSON parser violates specification and allows comments anyway. So if the argument is comments, there's no need to keep YAML for that reason.

clintongormley · 2017-01-27T11:03:47Z

I agree with keeping YAML - for some reasons, sysadmins seem to prefer it to JSON

The one I'm torn about is CBOR. It's just one datapoint but (eg) the Perl CBOR module (written by the same author as the fastest JSON module) says that encoding CBOR is twice as fast as JSON, and decoding about 15-30% as fast. CBOR encoding is about 20% smaller than JSON.

This could have a significant impact on bulk uploads. The only problem is that our CBOR implementation doesn't support the bulk API atm (see #8481)

Given that we've survived thus far without CBOR support for bulk, I'd be OK with removing CBOR support entirely if the code simplification warrants it. Just a shame to leave this performance boost on the table.

javanna · 2017-01-27T11:46:39Z

I do not think it buys us much to remove just CBOR if we keep around yaml and SMILE. With the abstractions we have in place, once we introduce tests for Smile we can easily do the same for CBOR. Maybe keeping only Yaml could simplify things a bit as that would leave us without binary content types, yet the XContent abstraction would still be needed.

dadoonet · 2017-01-27T12:04:05Z

Is there a way to generate a Json, Smile, CBOR, Yaml whatever content from our REST test suite so we can choose randomly with which implementation we want to tun our integration tests?

Would that make sense to make sure we cover always all our APIs without needing to duplicate the scenarios?

javanna · 2017-01-27T12:45:42Z

Is there a way to generate a Json, Smile, CBOR, Yaml whatever content from our REST test suite so we can choose randomly with which implementation we want to tun our integration tests?

Yes that is one of the options I had in mind. We would need to transparently convert requests before they get sent, and responses right after they get returned. That wouldn't cover 100% of what we need, but already a lot. We also have methods in our java api that allow to provide any content, which then gets converted to json internally, those would need to be tested separately.

clintongormley · 2017-01-27T12:53:43Z

btw re CBOR - I don't know how much difference it would make in the real world - I'd love to mock up a solution for #8481 and see if the difference is significant. If not, then we can can it.

javanna · 2017-01-27T14:20:26Z

Discussed in FixItFriday, we would like to remove support for CBOR, Smile and Yaml in the future, but we don't want to make this breaking change at the moment. We may reconsider deprecation in 6.x and removal in 7.0. We will investigate other ways to have better performance for _bulk, potentially a different endpoint accepting a different format that can be even faster than CBOR.

javanna · 2017-01-29T09:30:46Z

Closing for now. We will reconsider this in the future.

javanna · 2018-03-27T19:54:41Z

Re-opening, this is probably a good time to reconsider.

dynaxis · 2018-05-10T09:08:02Z

I reached here while considering use of CBOR or SMILE, since it was not clear from the docs whether they are supported in the Java REST clients.

Since I'm using my own binary encoding of docs in a separate stored field other than _source (FYI, it's similar to Google's FlatBuffers in spirit. I access some parts of the docs in the client side without parsing due to heavy client-side filtering and joining requirement) and preparing a switch to the Java REST client from the transport one, I thought either CBOR or SMILE may benefit my case at least due to more efficient encoding of the binary chunks.

Also recently I got across a project involving client-side processing of huge buckets of aggregation results, and am thinking the binary formats may be beneficial in those cases in terms of performance.

If CBOR/SMILE are to be deprecated for removal soon, please communicate the fact clearly as soon as possible. In my case, I prefer having anyone of them (that is, a binary encoding). But don't have any data yet on how much gain such a binary format can yield.

Thanks.

javanna · 2018-05-11T11:33:46Z

@dynaxis are you moving to the high-level REST client or low-level one?

dynaxis · 2018-05-11T16:10:01Z

@javanna the high level one.

javanna · 2018-05-14T09:55:16Z

@dynaxis thanks for your feedback. the Java high-level client supports storing and retrieving documents in any format supported on the server, namely json, yaml, cbor or smile. We don't document this though as we are considering removing support for binary formats on the server-side. cc @danielmitterdorfer maybe another idea for a benchmark .

dynaxis · 2018-05-14T23:55:01Z

@javanna Thank you for sharing it. If you are strongly considering removal of binary formats, then maybe it's safe not to try any of binary formats. Anyway I'll continue the migration to Java REST client since the current transport client is slated for removal. Thanks again.

jpountz · 2020-03-03T16:59:20Z

Closing it for now, like #32433. It's unlikely we'll move forward on this in the near future due to the breaking nature of this change. We can reopen if necessary.

azhuchkov · 2020-12-08T11:20:19Z

I suppose returning data in a non-native format requires extra conversion step at the server-side that makes supporting those formats less useful in terms of performance.

My observations show that ES uses JSON as a native format since using CBOR or Smile is less performant (overall request processing time increases).

I am looking for a way to improve performance, so I would appreciate your feedback.

javanna added :Core/Infra/REST API REST infrastructure and utilities discuss labels Jan 26, 2017

javanna removed the discuss label Jan 27, 2017

javanna closed this as completed Jan 29, 2017

javanna mentioned this issue Feb 6, 2017

Store XContentType of source in translog and index #22863

Open

scottsom mentioned this issue Jun 16, 2017

Support for Ion Serialization Format #25272

Closed

javanna added the discuss label Mar 27, 2018

javanna reopened this Mar 27, 2018

colings86 added the >breaking label Apr 24, 2018

tomcallahan assigned eskibars Jul 27, 2018

tomcallahan removed the discuss label Jul 27, 2018

jpountz mentioned this issue Jul 27, 2018

Remove support for CBOR and Smile on the REST layer #32433

Closed

jpountz closed this as completed Mar 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do we still want to support 4 different content types? #22811

Do we still want to support 4 different content types? #22811

javanna commented Jan 26, 2017 •

edited

Loading

karmi commented Jan 26, 2017

javanna commented Jan 26, 2017

drewr commented Jan 26, 2017 •

edited

Loading

karmi commented Jan 26, 2017 •

edited

Loading

javanna commented Jan 26, 2017

jasontedor commented Jan 26, 2017

clintongormley commented Jan 27, 2017

javanna commented Jan 27, 2017

dadoonet commented Jan 27, 2017

javanna commented Jan 27, 2017

clintongormley commented Jan 27, 2017

javanna commented Jan 27, 2017

javanna commented Jan 29, 2017

javanna commented Mar 27, 2018

dynaxis commented May 10, 2018

javanna commented May 11, 2018

dynaxis commented May 11, 2018

javanna commented May 14, 2018

dynaxis commented May 14, 2018

jpountz commented Mar 3, 2020

azhuchkov commented Dec 8, 2020

Do we still want to support 4 different content types? #22811

Do we still want to support 4 different content types? #22811

Comments

javanna commented Jan 26, 2017 • edited Loading

karmi commented Jan 26, 2017

javanna commented Jan 26, 2017

drewr commented Jan 26, 2017 • edited Loading

karmi commented Jan 26, 2017 • edited Loading

javanna commented Jan 26, 2017

jasontedor commented Jan 26, 2017

clintongormley commented Jan 27, 2017

javanna commented Jan 27, 2017

dadoonet commented Jan 27, 2017

javanna commented Jan 27, 2017

clintongormley commented Jan 27, 2017

javanna commented Jan 27, 2017

javanna commented Jan 29, 2017

javanna commented Mar 27, 2018

dynaxis commented May 10, 2018

javanna commented May 11, 2018

dynaxis commented May 11, 2018

javanna commented May 14, 2018

dynaxis commented May 14, 2018

jpountz commented Mar 3, 2020

azhuchkov commented Dec 8, 2020

javanna commented Jan 26, 2017 •

edited

Loading

drewr commented Jan 26, 2017 •

edited

Loading

karmi commented Jan 26, 2017 •

edited

Loading