Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do we still want to support 4 different content types? #22811

Closed
javanna opened this issue Jan 26, 2017 · 21 comments
Closed

Do we still want to support 4 different content types? #22811

javanna opened this issue Jan 26, 2017 · 21 comments
Assignees
Labels
>breaking :Core/Infra/REST API REST infrastructure and utilities

Comments

@javanna
Copy link
Member

javanna commented Jan 26, 2017

As part of #22691 we are removing content type auto-detection for incoming requests, in favour of reading the Content-Type header. Elasticsearch supports 4 different formats: json, yaml, smile and cbor. All 4 are supported with few exceptions: bulk, msearch and friends only support json or smile (actually multi line json or smile composed of one valid object per line, the whole request is never valid json nor smile).

Given that the majority of our users use json, that we barely test the other formats, especially the binary ones, and that handling 4 different formats complicates our code, not to mention in some places we may completely forget about this and just assume that we get json, I wonder if we still want to support all these 4 formats.

I think the outcome of this issue should be either deprecate and then remove support for yaml, smile and cbor, or invest in adding coverage for these formats so that we feel more comfortable about supporting them. All said also applies to both requests and responses.

@javanna javanna added :Core/Infra/REST API REST infrastructure and utilities discuss labels Jan 26, 2017
@karmi
Copy link
Contributor

karmi commented Jan 26, 2017

I vote for trimming down the number of supported formats, with the exception of YAML -- seems like some people do use that, eg. for mappings/etc, where they take an advantage of comment support.

@javanna
Copy link
Member Author

javanna commented Jan 26, 2017

@karmi we convert all the mappings to json before storing them in the cluster state, so once converted the comments are gone. They are not returned e.g. by the get mapping api. Is this what you'd expect?

@drewr
Copy link
Contributor

drewr commented Jan 26, 2017

I find myself using YAML quite a bit as well. I don't care about how it's serialized internally, the utility is on the user's end. And while I haven't used Smile and CBOR nearly as much, it seems like it would be a shame to limit the options just to clean the code up.

I'm +1000 on getting rid of the silly heuristics (like the ^---\n for YAML) in favor of Content-Type though. Seems like that alone would be the biggest contributor to cleaning up the code.

@karmi
Copy link
Contributor

karmi commented Jan 26, 2017

@javanna I think the fact that the payload is converted to JSON after it's received doesn't matter here -- people probably have their configuration in some kind of repository (scrips, configuration management code, application code), and that is the place where they care about the comments (and other things).

Of course, and argument can be always made, that people can use real code for that, Python, Ruby, ..., whatever, where they can have proper comments and other nice things.

(UPDATED)

@javanna
Copy link
Member Author

javanna commented Jan 26, 2017

just to clean the code up.

preventing future or fixing existing bugs is another way to say it which makes it sound much better :)

@jasontedor
Copy link
Member

Our JSON parser violates specification and allows comments anyway. So if the argument is comments, there's no need to keep YAML for that reason.

@clintongormley
Copy link
Contributor

I agree with keeping YAML - for some reasons, sysadmins seem to prefer it to JSON

The one I'm torn about is CBOR. It's just one datapoint but (eg) the Perl CBOR module (written by the same author as the fastest JSON module) says that encoding CBOR is twice as fast as JSON, and decoding about 15-30% as fast. CBOR encoding is about 20% smaller than JSON.

This could have a significant impact on bulk uploads. The only problem is that our CBOR implementation doesn't support the bulk API atm (see #8481)

Given that we've survived thus far without CBOR support for bulk, I'd be OK with removing CBOR support entirely if the code simplification warrants it. Just a shame to leave this performance boost on the table.

@javanna
Copy link
Member Author

javanna commented Jan 27, 2017

I do not think it buys us much to remove just CBOR if we keep around yaml and SMILE. With the abstractions we have in place, once we introduce tests for Smile we can easily do the same for CBOR. Maybe keeping only Yaml could simplify things a bit as that would leave us without binary content types, yet the XContent abstraction would still be needed.

@dadoonet
Copy link
Member

Is there a way to generate a Json, Smile, CBOR, Yaml whatever content from our REST test suite so we can choose randomly with which implementation we want to tun our integration tests?

Would that make sense to make sure we cover always all our APIs without needing to duplicate the scenarios?

@javanna
Copy link
Member Author

javanna commented Jan 27, 2017

Is there a way to generate a Json, Smile, CBOR, Yaml whatever content from our REST test suite so we can choose randomly with which implementation we want to tun our integration tests?

Yes that is one of the options I had in mind. We would need to transparently convert requests before they get sent, and responses right after they get returned. That wouldn't cover 100% of what we need, but already a lot. We also have methods in our java api that allow to provide any content, which then gets converted to json internally, those would need to be tested separately.

@clintongormley
Copy link
Contributor

btw re CBOR - I don't know how much difference it would make in the real world - I'd love to mock up a solution for #8481 and see if the difference is significant. If not, then we can can it.

@javanna
Copy link
Member Author

javanna commented Jan 27, 2017

Discussed in FixItFriday, we would like to remove support for CBOR, Smile and Yaml in the future, but we don't want to make this breaking change at the moment. We may reconsider deprecation in 6.x and removal in 7.0. We will investigate other ways to have better performance for _bulk, potentially a different endpoint accepting a different format that can be even faster than CBOR.

@javanna javanna removed the discuss label Jan 27, 2017
@javanna
Copy link
Member Author

javanna commented Jan 29, 2017

Closing for now. We will reconsider this in the future.

@javanna
Copy link
Member Author

javanna commented Mar 27, 2018

Re-opening, this is probably a good time to reconsider.

@javanna javanna reopened this Mar 27, 2018
@dynaxis
Copy link

dynaxis commented May 10, 2018

I reached here while considering use of CBOR or SMILE, since it was not clear from the docs whether they are supported in the Java REST clients.

Since I'm using my own binary encoding of docs in a separate stored field other than _source (FYI, it's similar to Google's FlatBuffers in spirit. I access some parts of the docs in the client side without parsing due to heavy client-side filtering and joining requirement) and preparing a switch to the Java REST client from the transport one, I thought either CBOR or SMILE may benefit my case at least due to more efficient encoding of the binary chunks.

Also recently I got across a project involving client-side processing of huge buckets of aggregation results, and am thinking the binary formats may be beneficial in those cases in terms of performance.

If CBOR/SMILE are to be deprecated for removal soon, please communicate the fact clearly as soon as possible. In my case, I prefer having anyone of them (that is, a binary encoding). But don't have any data yet on how much gain such a binary format can yield.

Thanks.

@javanna
Copy link
Member Author

javanna commented May 11, 2018

@dynaxis are you moving to the high-level REST client or low-level one?

@dynaxis
Copy link

dynaxis commented May 11, 2018

@javanna the high level one.

@javanna
Copy link
Member Author

javanna commented May 14, 2018

@dynaxis thanks for your feedback. the Java high-level client supports storing and retrieving documents in any format supported on the server, namely json, yaml, cbor or smile. We don't document this though as we are considering removing support for binary formats on the server-side. cc @danielmitterdorfer maybe another idea for a benchmark .

@dynaxis
Copy link

dynaxis commented May 14, 2018

@javanna Thank you for sharing it. If you are strongly considering removal of binary formats, then maybe it's safe not to try any of binary formats. Anyway I'll continue the migration to Java REST client since the current transport client is slated for removal. Thanks again.

@jpountz
Copy link
Contributor

jpountz commented Mar 3, 2020

Closing it for now, like #32433. It's unlikely we'll move forward on this in the near future due to the breaking nature of this change. We can reopen if necessary.

@jpountz jpountz closed this as completed Mar 3, 2020
@azhuchkov
Copy link

I suppose returning data in a non-native format requires extra conversion step at the server-side that makes supporting those formats less useful in terms of performance.

My observations show that ES uses JSON as a native format since using CBOR or Smile is less performant (overall request processing time increases).

I am looking for a way to improve performance, so I would appreciate your feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>breaking :Core/Infra/REST API REST infrastructure and utilities
Projects
None yet
Development

No branches or pull requests