http.response.status_code as long instead of keyword or integer #564

GuillaumeDuf · 2019-09-17T15:45:24Z

Hello ,
I did not found any discussion on this topic so , I ask the question
In ECS the http.response.status_code is mapped as long , why ?

Using an integer brings the advantage of taking up less space, but we are not supposed to do digital operations (sum/avg...) on an HTTP status because all codes are between 100 and 599.

Using a keyword will allow to run an aggregation without specifying a null_value.
The keyword type will also allow to make range query if necessary (it will be in alphanumeric order):
ECS source code

 - name: response.status_code
      format: string
      level: extended
      type: long
      description: >
        HTTP response status code.
      example: 404

The text was updated successfully, but these errors were encountered:

webmat · 2019-09-23T15:38:28Z

You're right that codes like HTTP status codes aren't meant to have arithmetic operations done on them. We're making them keyword as much as possible in general. For HTTP we went with long because it was so well established that this wasn't going to change, and nobody would try to map another type of code to it (as opposed to event.code, which may be used to contain numeric or alphanum codes).

As for null_value, this hadn't come up in the discussion for HTTP. Can you expand on the problem you see here?

andrewthad · 2019-10-03T00:01:47Z

For HTTP we went with long because it was so well established that this wasn't going to change, and nobody would try to map another type of code to it.

Then why is network.iana_number a keyword instead of a long?

webmat · 2019-10-03T12:22:50Z

@andrewthad Oh no, you win! You found an inconsistency!

Kidding, there's lots of inconsistencies in ECS 😂We're doing our best to avoid them, but additional eyes on the PRs are welcome. Seems like you have

Please let us know if either network.iana_number being keyword or response.status_code being long are causing a specific problem. We'll consider addressing those if that's the case.

If the questions were just being posed out of curiosity, that's fine too. But please close the issue, once the curiosity is satisfied :-)

andrewthad · 2019-10-03T12:42:30Z

That's a wonderful gif. The types aren't causing any problems at the moment. I'm currently using ECS for semi-structured logs in a SIEM I'm working on, and when I pass records between services, I'm trying to make sure their representation is compact and supports efficient testing against common predicates. Numbers are better for this kind of thing. So I currently deviate from the standard in a few ways, mostly by making things numbers that are supposed to be text.

A thought that I've had is that it might be useful if the types weren't just Elasticsearch's types. Meaning that it could be useful to have a layer above ES types that maps down onto ES types. Something like:

8-bit unsigned number -> long
16-bit unsigned number -> long
64-bit signed -> long
64-bit unsigned identifier (number that only supports equality) -> keyword
term -> keyword

That's just a vague sketch, but then ECS could become more useful as an efficient representation of data for applications other than Elasticsearch.

webmat · 2019-10-03T13:18:44Z

Yes, that's something we could consider. I'll run that by a few other folks who may feel the same way, to get their take on this.

GuillaumeDuf · 2020-10-14T10:37:59Z

having status code has integer cause problems for machine learning jobs , because you can only use theses field as "metric field" but not a field to partition analysis by . ..

webmat · 2020-10-29T18:04:57Z

Thanks for chiming in again. This had fallen through the cracks of the floor :-)

I took a note on our list of possible breaking changes to fix at the next major (#839), and I'm also checking with colleagues on the ML team on what the possibilities are.

webmat · 2020-10-29T19:24:29Z

You're correct that the main ML UI doesn't expose numeric fields as being available to partition by.

But if you go to the JSON editor in the advanced section, you can use them as such.

djptek · 2021-03-04T11:02:14Z

There is a nice way to partition numeric fields (up to 100, which exceeds the 41 http response codes) to generate a histogram & or other visualizations using Lens, see #839 (comment)

djptek · 2021-03-04T11:38:41Z

Here's an example of partition using programmatically generated response codes (long) for every possible value using Lens

djptek · 2021-07-13T13:41:12Z

Closing due to lack of update, please feel free to reopen

iamhowardtheduck · 2022-01-04T16:24:10Z

I would like to re-open this issue, as I have many customers who require the use of keyword for this error code.

blookot · 2022-05-25T08:46:53Z

@webmat I'm quite surprised we need to discuss this issue
How has the status code been defined as long at the first place...?
status code must be a keyword. It's an issue in all kibana as well as ML as mentioned before.
Please update ECS
Even the logs from kibana sample dataset sets a "response.keyword" field.

If only we could add a keyword nested field under a long...
Or if we had a chance to define a runtime field converting the long to string...
But both won't work.

Thank you in advance.

blookot · 2022-05-25T08:53:22Z

I'm adding a quick erratum: the runtime field to convert to keyword is:

PUT /test-mapping/_mapping
{
  "runtime": {
    "statuscode_keyword": {
      "type": "keyword",
      "script": {
        "source": "emit(doc['statuscode'].value.toString());"
      }
    }
  }
}

Still this won't scale

djptek · 2022-05-25T09:59:29Z

Hi @blookot @iamhowardtheduck this is a long running discussion, which predates ECS. For the most recent history, please see top of this post. There are certainly arguments for both alternatives. Also, there is an existing user base whose historical and current data is linked to the type which has been applied.

If you need to run keyword analytics on this field, then rather than a runtime field perhaps consider:

using painless in an ingest node to create a keyword representation of the status code in a new field as target for your analytics
if space is premium, you also have the Mapping Parameters index and store

djptek · 2022-05-25T13:02:16Z

@blookot @iamhowardtheduck as to the root of all this, that's really down to whoever decided to encode what so far turned out to be 41 discrete values using 3 digits in 1991 or thereabouts, I'm sure they had their reasons

For the record, I would generally expect to map any emerging field with the suffix "code" to keyword, assuming minimal code &/or historical business data that I would need to rewrite/reprocess

blookot · 2022-06-01T13:54:22Z

Hi @djptek
thank you for your answers.
You mean I could add the "index" mapping parameter to the http.response.status_code field, and that would make is aggregable (in visualizations and ML wizards)?
Which means index mappings (set from beats) could be customized to solve this...
Or simply add a second http.response.status_code_keyword field?

djptek · 2022-06-02T08:18:07Z

Hi @blookot

add a second http.response.status_code_keyword field

this would work - you'd need to add e.g. some painless in the index pipeline to populate that field &/or do a reindex to add this to your legacy data if required

BBQigniter · 2023-09-20T09:08:40Z

This really is strange - I had a look at https://github.com/elastic/integrations/blob/main/packages/nginx/kibana/ml_module/nginx-Logs-ml.json and wanted to rebuild the status_code_rate_nginx ML-job for one of our own indices that keeps to ECS, only to figure out that I cannot use http.response.status_code in the "Split field" drop-down for a "Multi-metric" job.

Having a quick look at the pipeline-config from the integration and https://docs.elastic.co/en/integrations/nginx, it seems it's also ingested as long - so how would that included ML-job work? Is it possible to create such jobs anyway via the API if you know what you do?

GuillaumeDuf changed the title ~~http.response.status_code as long instead of keyword~~ http.response.status_code as long instead of keyword or integer Sep 17, 2019

webmat added the discuss label Oct 3, 2019

webmat mentioned this issue Oct 29, 2020

This issue is meant to collect breaking changes we want to do for ECS 8.0 #839

Closed

djptek closed this as completed Jul 13, 2021

iamhowardtheduck reopened this Jan 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

http.response.status_code as long instead of keyword or integer #564

http.response.status_code as long instead of keyword or integer #564

GuillaumeDuf commented Sep 17, 2019

webmat commented Sep 23, 2019

andrewthad commented Oct 3, 2019

webmat commented Oct 3, 2019 •

edited

Loading

andrewthad commented Oct 3, 2019

webmat commented Oct 3, 2019 •

edited

Loading

GuillaumeDuf commented Oct 14, 2020

webmat commented Oct 29, 2020

webmat commented Oct 29, 2020

djptek commented Mar 4, 2021

djptek commented Mar 4, 2021 •

edited

Loading

djptek commented Jul 13, 2021

iamhowardtheduck commented Jan 4, 2022

blookot commented May 25, 2022

blookot commented May 25, 2022

djptek commented May 25, 2022 •

edited

Loading

djptek commented May 25, 2022 •

edited

Loading

blookot commented Jun 1, 2022

djptek commented Jun 2, 2022

BBQigniter commented Sep 20, 2023

http.response.status_code as long instead of keyword or integer #564

http.response.status_code as long instead of keyword or integer #564

Comments

GuillaumeDuf commented Sep 17, 2019

webmat commented Sep 23, 2019

andrewthad commented Oct 3, 2019

webmat commented Oct 3, 2019 • edited Loading

andrewthad commented Oct 3, 2019

webmat commented Oct 3, 2019 • edited Loading

GuillaumeDuf commented Oct 14, 2020

webmat commented Oct 29, 2020

webmat commented Oct 29, 2020

djptek commented Mar 4, 2021

djptek commented Mar 4, 2021 • edited Loading

djptek commented Jul 13, 2021

iamhowardtheduck commented Jan 4, 2022

blookot commented May 25, 2022

blookot commented May 25, 2022

djptek commented May 25, 2022 • edited Loading

djptek commented May 25, 2022 • edited Loading

blookot commented Jun 1, 2022

djptek commented Jun 2, 2022

BBQigniter commented Sep 20, 2023

webmat commented Oct 3, 2019 •

edited

Loading

webmat commented Oct 3, 2019 •

edited

Loading

djptek commented Mar 4, 2021 •

edited

Loading

djptek commented May 25, 2022 •

edited

Loading

djptek commented May 25, 2022 •

edited

Loading