-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
http.response.status_code as long instead of keyword or integer #564
Comments
You're right that codes like HTTP status codes aren't meant to have arithmetic operations done on them. We're making them As for |
Then why is |
@andrewthad Oh no, you win! You found an inconsistency! Kidding, there's lots of inconsistencies in ECS 😂We're doing our best to avoid them, but additional eyes on the PRs are welcome. Seems like you have Please let us know if either If the questions were just being posed out of curiosity, that's fine too. But please close the issue, once the curiosity is satisfied :-) |
That's a wonderful gif. The types aren't causing any problems at the moment. I'm currently using ECS for semi-structured logs in a SIEM I'm working on, and when I pass records between services, I'm trying to make sure their representation is compact and supports efficient testing against common predicates. Numbers are better for this kind of thing. So I currently deviate from the standard in a few ways, mostly by making things numbers that are supposed to be text. A thought that I've had is that it might be useful if the types weren't just Elasticsearch's types. Meaning that it could be useful to have a layer above ES types that maps down onto ES types. Something like:
That's just a vague sketch, but then ECS could become more useful as an efficient representation of data for applications other than Elasticsearch. |
Yes, that's something we could consider. I'll run that by a few other folks who may feel the same way, to get their take on this. |
having status code has integer cause problems for machine learning jobs , because you can only use theses field as "metric field" but not a field to partition analysis by . .. |
Thanks for chiming in again. This had fallen through the cracks of the floor :-) I took a note on our list of possible breaking changes to fix at the next major (#839), and I'm also checking with colleagues on the ML team on what the possibilities are. |
You're correct that the main ML UI doesn't expose numeric fields as being available to partition by. But if you go to the JSON editor in the advanced section, you can use them as such. |
There is a nice way to partition numeric fields (up to 100, which exceeds the 41 http response codes) to generate a histogram & or other visualizations using Lens, see #839 (comment) |
Closing due to lack of update, please feel free to reopen |
I would like to re-open this issue, as I have many customers who require the use of keyword for this error code. |
@webmat I'm quite surprised we need to discuss this issue If only we could add a keyword nested field under a long... Thank you in advance. |
I'm adding a quick erratum: the runtime field to convert to keyword is:
Still this won't scale |
Hi @blookot @iamhowardtheduck this is a long running discussion, which predates ECS. For the most recent history, please see top of this post. There are certainly arguments for both alternatives. Also, there is an existing user base whose historical and current data is linked to the type which has been applied. If you need to run keyword analytics on this field, then rather than a runtime field perhaps consider: |
@blookot @iamhowardtheduck as to the root of all this, that's really down to whoever decided to encode what so far turned out to be 41 discrete values using 3 digits in 1991 or thereabouts, I'm sure they had their reasons For the record, I would generally expect to map any emerging field with the suffix "code" to keyword, assuming minimal code &/or historical business data that I would need to rewrite/reprocess |
Hi @djptek |
Hi @blookot
this would work - you'd need to add e.g. some painless in the index pipeline to populate that field &/or do a reindex to add this to your legacy data if required |
This really is strange - I had a look at https://github.com/elastic/integrations/blob/main/packages/nginx/kibana/ml_module/nginx-Logs-ml.json and wanted to rebuild the Having a quick look at the pipeline-config from the integration and https://docs.elastic.co/en/integrations/nginx, it seems it's also ingested as long - so how would that included ML-job work? Is it possible to create such jobs anyway via the API if you know what you do? |
Hello ,
I did not found any discussion on this topic so , I ask the question
In ECS the
http.response.status_code
is mapped as long , why ?Using an integer brings the advantage of taking up less space, but we are not supposed to do digital operations (sum/avg...) on an HTTP status because all codes are between 100 and 599.
Using a keyword will allow to run an aggregation without specifying a null_value.
The keyword type will also allow to make range query if necessary (it will be in alphanumeric order):
ECS source code
The text was updated successfully, but these errors were encountered: