-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do we still want to support 4 different content types? #22811
Comments
I vote for trimming down the number of supported formats, with the exception of YAML -- seems like some people do use that, eg. for mappings/etc, where they take an advantage of comment support. |
@karmi we convert all the mappings to json before storing them in the cluster state, so once converted the comments are gone. They are not returned e.g. by the get mapping api. Is this what you'd expect? |
I find myself using YAML quite a bit as well. I don't care about how it's serialized internally, the utility is on the user's end. And while I haven't used Smile and CBOR nearly as much, it seems like it would be a shame to limit the options just to clean the code up. I'm +1000 on getting rid of the silly heuristics (like the |
@javanna I think the fact that the payload is converted to JSON after it's received doesn't matter here -- people probably have their configuration in some kind of repository (scrips, configuration management code, application code), and that is the place where they care about the comments (and other things). Of course, and argument can be always made, that people can use real code for that, Python, Ruby, ..., whatever, where they can have proper comments and other nice things. (UPDATED) |
preventing future or fixing existing bugs is another way to say it which makes it sound much better :) |
Our JSON parser violates specification and allows comments anyway. So if the argument is comments, there's no need to keep YAML for that reason. |
I agree with keeping YAML - for some reasons, sysadmins seem to prefer it to JSON The one I'm torn about is CBOR. It's just one datapoint but (eg) the Perl CBOR module (written by the same author as the fastest JSON module) says that encoding CBOR is twice as fast as JSON, and decoding about 15-30% as fast. CBOR encoding is about 20% smaller than JSON. This could have a significant impact on bulk uploads. The only problem is that our CBOR implementation doesn't support the bulk API atm (see #8481) Given that we've survived thus far without CBOR support for bulk, I'd be OK with removing CBOR support entirely if the code simplification warrants it. Just a shame to leave this performance boost on the table. |
I do not think it buys us much to remove just CBOR if we keep around yaml and SMILE. With the abstractions we have in place, once we introduce tests for Smile we can easily do the same for CBOR. Maybe keeping only Yaml could simplify things a bit as that would leave us without binary content types, yet the XContent abstraction would still be needed. |
Is there a way to generate a Json, Smile, CBOR, Yaml whatever content from our REST test suite so we can choose randomly with which implementation we want to tun our integration tests? Would that make sense to make sure we cover always all our APIs without needing to duplicate the scenarios? |
Yes that is one of the options I had in mind. We would need to transparently convert requests before they get sent, and responses right after they get returned. That wouldn't cover 100% of what we need, but already a lot. We also have methods in our java api that allow to provide any content, which then gets converted to json internally, those would need to be tested separately. |
btw re CBOR - I don't know how much difference it would make in the real world - I'd love to mock up a solution for #8481 and see if the difference is significant. If not, then we can can it. |
Discussed in FixItFriday, we would like to remove support for CBOR, Smile and Yaml in the future, but we don't want to make this breaking change at the moment. We may reconsider deprecation in 6.x and removal in 7.0. We will investigate other ways to have better performance for |
Closing for now. We will reconsider this in the future. |
Re-opening, this is probably a good time to reconsider. |
I reached here while considering use of CBOR or SMILE, since it was not clear from the docs whether they are supported in the Java REST clients. Since I'm using my own binary encoding of docs in a separate stored field other than Also recently I got across a project involving client-side processing of huge buckets of aggregation results, and am thinking the binary formats may be beneficial in those cases in terms of performance. If CBOR/SMILE are to be deprecated for removal soon, please communicate the fact clearly as soon as possible. In my case, I prefer having anyone of them (that is, a binary encoding). But don't have any data yet on how much gain such a binary format can yield. Thanks. |
@dynaxis are you moving to the high-level REST client or low-level one? |
@javanna the high level one. |
@dynaxis thanks for your feedback. the Java high-level client supports storing and retrieving documents in any format supported on the server, namely json, yaml, cbor or smile. We don't document this though as we are considering removing support for binary formats on the server-side. cc @danielmitterdorfer maybe another idea for a benchmark . |
@javanna Thank you for sharing it. If you are strongly considering removal of binary formats, then maybe it's safe not to try any of binary formats. Anyway I'll continue the migration to Java REST client since the current transport client is slated for removal. Thanks again. |
Closing it for now, like #32433. It's unlikely we'll move forward on this in the near future due to the breaking nature of this change. We can reopen if necessary. |
I suppose returning data in a non-native format requires extra conversion step at the server-side that makes supporting those formats less useful in terms of performance. My observations show that ES uses JSON as a native format since using CBOR or Smile is less performant (overall request processing time increases). I am looking for a way to improve performance, so I would appreciate your feedback. |
As part of #22691 we are removing content type auto-detection for incoming requests, in favour of reading the
Content-Type
header. Elasticsearch supports 4 different formats: json, yaml, smile and cbor. All 4 are supported with few exceptions: bulk, msearch and friends only support json or smile (actually multi line json or smile composed of one valid object per line, the whole request is never valid json nor smile).Given that the majority of our users use json, that we barely test the other formats, especially the binary ones, and that handling 4 different formats complicates our code, not to mention in some places we may completely forget about this and just assume that we get json, I wonder if we still want to support all these 4 formats.
I think the outcome of this issue should be either deprecate and then remove support for yaml, smile and cbor, or invest in adding coverage for these formats so that we feel more comfortable about supporting them. All said also applies to both requests and responses.
The text was updated successfully, but these errors were encountered: