Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify mapping problems after migrating to ecs@mappings #10848

Closed
zmoog opened this issue Aug 22, 2024 · 6 comments
Closed

Verify mapping problems after migrating to ecs@mappings #10848

zmoog opened this issue Aug 22, 2024 · 6 comments
Assignees
Labels
bug Something isn't working, use only for issues Integration:All Applies to all integrations [Integration not found in source]

Comments

@zmoog
Copy link
Contributor

zmoog commented Aug 22, 2024

A few users reported mapping errors on a few integrations. We suspect these problems may be related to integrations that migrated to ecs@mappings with recent updates.

Here is the list of fields with mapping issues:

Field Previous mapping Current mapping Data Stream Status Root Cause Notes Issues / PRs
client.geo.location geo_point object logs-azure.graphactivitylogs-* Reproduced integration-mapping-problem I believe this is an integration-mapping-problem #11102
source.geo.location geo_point object logs-azure.graphactivitylogs-* Reproduced integration-mapping-problem I believe this is an integration-mapping-problem #11102
destination.port long keyword logs-cisco_aironet.log-* Needs sample docs ecs@mappings+type-coercion Probably an ecs@mappings+type-coercion issue, but I don't have a sample document to double-check; probable, but still a hypothesis. #11103
event.duration long keyword logs-azure.activitylogs.log-* Needs sample docs ecs@mappings+type-coercion Probably an ecs@mappings+type-coercion issue, but I don't have a sample document to double-check; probable, but still a hypothesis. #11104
dns.authorities
dns.id long keyword logs-logstash.tpot-*
error.code keyword long logs-system.security-* Unclear integration-update PR changed the value type to string with the PR https://github.com/elastic/integrations/pull/10529/files ; expected field mapping in ECS is keyword https://www.elastic.co/guide/en/ecs/current/ecs-error.html#field-error-code
event.severity long keyword logs-cisco_aironet.log-* Reproduced ecs@mappings+type-coercion ecs@mappings+type-coercion, I found the following sample doc in the integration test files: {"event.severity": "4"}; ECS expected type is long https://www.elastic.co/guide/en/ecs/current/ecs-event.html#field-event-severity #11105
http.request.body object flattened logs-apm.error-*
http.request.headers flattened object logs-apm.error-*
http.response.headers flattened object logs-apm.error-*
input object keyword logs-logstash.tpot-*
log.offset long keyword logs-microsoft_exchange_server.httpproxy-* Reproduced integration-update PR https://github.com/elastic/integrations/pull/9560/files added an explicit mapping to keyword
observer.ip ip keyword logs-ti_abusech_latest.dest_malware-*
request text object logs-logstash.tpot-*
response text object logs-logstash.tpot-*
session
sip.uri
status keyword long logs-logstash.tpot-*
threat.indicator.first_seen date keyword logs-ti_abusech.malware-* Reproduced ecs@mappings+date_detection:false elastic/elasticsearch#112444
threat.indicator.last_seen date keyword logs-ti_abusech.malwarebazaar-* Reproduced ecs@mappings+date_detection:false elastic/elasticsearch#112444
timestamp
user_agent object keyword logs-cisco_asa.log-* Unclear object is the expected mapping for user_agent; see https://www.elastic.co/guide/en/ecs/current/ecs-user_agent.html

Root Causes

Cause Summary Solution
ecs@mappings+type-coercion Mapping changed because ecs@mappings does not perform type coercion Set the right value type in the input/pipeline, or restore explicit mapping in fields/ecs.yml file
ecs@mappings+date_detection:false Setting date_detection: false cause a few fields to not be mapped as date Set date_detection: true or update ecs@mappings
integration-mapping-problem Incorrect mapping in the integration Review the change and fix the mapping, in necessary.
integration-update Explicit change in integration Probably deal with this breaking change if the outcome is in line with ECS
fieldless-search On 8.13.x ECS fields are no longer included in index.query.default_field, so they are not available in fieldless search As of today, restoring the fields/ecs.yml for 8.13.x users can restore fieldless searches. Users on > 8.14.x are not affected since index.query.default_field is *

ecs@mappings+type-coercion

The ecs@mappings component template does not perform type coercion, so if the value is a string, ES maps it as a keyword.

Here is an example, if I perform the following requests using the Dev Tools:

DELETE _data_stream/logs-whatever-sdh5075
POST logs-whatever-sdh5075/_doc
{
  "@timestamp": "2024-08-20T16:58:01+02:00",
  "destination": {
    "port": "8080"
  }
}
GET logs-whatever-sdh5075/_mapping/field/destination.port

I get the following result:

{
  ".ds-logs-whatever-sdh5075-2024.08.22-000001": {
    "mappings": {
      "destination.port": {
        "full_name": "destination.port",
        "mapping": {
          "port": {
            "type": "keyword",
            "ignore_above": 1024
          }
        }
      }
    }
  }
}

ecs@mappings+date_detection:false

When date_detection is disabled, the following fields aren’t mapped correctly:

threat.indicator.first_seen
threat.indicator.modified_at
threat.enrichments.indicator.modified_at
threat.enrichments.matched.occurred
threat.enrichments.indicator.first_seen 
threat.enrichments.indicator.last_seen
threat.indicator.last_seen 

integration-mapping-problem

We probably need to change mappings in the integration to something similar (like most other integrations do):

- name: client.geo.location
  external: ecs
- name: source.geo.location
  external: ecs

Or remove these mappings and only use ecs@mappings.

integration-update

Mapping changed due to integration updates.

@zmoog zmoog self-assigned this Aug 22, 2024
@zmoog
Copy link
Contributor Author

zmoog commented Aug 22, 2024

Checking the first field, client.geo.location in the logs-azure.graphactivitylogs-* data stream. This field has an explicit mapping in the integration:

# packages/azure/data_stream/graphactivitylogs/fields/ecs.yml
- name: client.geo.location.lat
  external: ecs
- name: client.geo.location.lon
  external: ecs
- name: source.geo.location.lat
  external: ecs
- name: source.geo.location.lon
  external: ecs

This leads to the following mapping:

"client": {
  "properties": {
    "geo": {
      "properties": {
        "continent_name": {
          "type": "keyword",
          "ignore_above": 1024
        },
        "country_iso_code": {
          "type": "keyword",
          "ignore_above": 1024
        },
        "country_name": {
          "type": "keyword",
          "ignore_above": 1024
        },
        "location": {
          "properties": {
            "lat": {
              "type": "geo_point"
            },
            "lon": {
              "type": "geo_point"
            }
          }
        }
      }
    },
    "ip": {
      "type": "ip"
    }
  }
}

So client.geo.location here is an object.

Paradoxically, if I index the same document using a logs-*-* data stream, I get the correct mapping from ecs@mappings:

GET logs-whatever-sdh5075/_mapping/field/client.geo.location
{
  ".ds-logs-whatever-sdh5075-2024.08.22-000001": {
    "mappings": {
      "client.geo.location": {
        "full_name": "client.geo.location",
        "mapping": {
          "location": {
            "type": "geo_point"
          }
        }
      }
    }
  }
}

This seems a choice in the logs-azure.graphactivitylogs-* data stream that does not align with ECS and other data streams.

@andrewkroh andrewkroh added Integration:All Applies to all integrations [Integration not found in source] bug Something isn't working, use only for issues labels Aug 23, 2024
@lucabelluccini
Copy link
Contributor

lucabelluccini commented Aug 28, 2024

We're getting a potential issue with host.os.version which is no more defined on the System integration / processor dataset.

While in most cases it is mapped as keyword (it was mapped to keyword in the past), some users seem to get sporadically get mapped to float.
We can have "7.9" and "7.9 (Maipo)", but those seem to be correctly coerced into keyword. But it doesn't happen if Beats sends us 7.9.
I'm gathering a sample and I'll update the comment.

We have a sample document where we clearly see Beats / Elastic Agent can send "version": 7.2 (where the version is not wrapped in quotes, so it is coerced to float instead of being a keyword).

@zmoog
Copy link
Contributor Author

zmoog commented Sep 16, 2024

Adding fieldless search as an undesired side-effect of switching to ecs@mappings on 8.13.x.

zmoog added a commit that referenced this issue Sep 17, 2024
Align `client|source.geo.location` fields to ECS mapping.

Users reported mapping exceptions due to Elasticsearch mapping the `client|source.geo.location` fields as `object` instead of `geo_point`. See #10848 for more.
zmoog added a commit that referenced this issue Sep 18, 2024
Add ECS mapping for the `host.os.version` field (`keyword` type).

Users reported mapping exceptions due to `host.os.version` numeric values causing field mapping as `float` instead of `long`. See #10848 (comment) for more.

Elasticsearch maps a field as a `float` if it has a numeric value. This happens even on stack versions 8.13+ because ecs@mappings does not perform type coercion. For example, Elasticsearch maps `7.9` as `float`, while it maps `"7.9" or "7.9 (Maipo)"` as `keyword`.

By adding the `host.os.version` field to the file `fields/ecs.yml`, we ensure Elasticsearch uses the expected ECS field mapping as a `keyword` even when the value is a number. 

IMPORTANT: To fully resolve the issue, the input/integration owner should update it to emit the right value type to leverage ecs@mappings.
@willemdh
Copy link

So this is why event.duration in azure.eventhub integration is mapped as keyword in 8.13.4.. This is a very unfortunate and concerning issue. ECS compliant mappings are so important. When would this issue get fixed? Or is it already fixed in 8.16?

andrewkroh added a commit that referenced this issue Dec 6, 2024
The problem is that the `ecs@mappings` component template (introduced in >=8.13.0) 
does not perform type coercion to long when the value was a string. In this 
specific integration scenario, the `event.severity` value was provided as a 
string. So the dynamic mapping never matched, which left `event.severity` with 
the default 'keyword' mapping type, which does not comply with ECS.

This change adds back the static mapping for `event.severity` and modifies the 
grok pattern to perform conversion to long.

Relates: #10848

---------

Co-authored-by: Andrew Kroh <andrew.kroh@elastic.co>
@zmoog
Copy link
Contributor Author

zmoog commented Jan 20, 2025

So this is why event.duration in azure.eventhub integration is mapped as keyword in 8.13.4.. This is a very unfortunate and concerning issue. ECS compliant mappings are so important.
When would this issue get fixed? Or is it already fixed in 8.16?

@willemdh, I'm sorry for the late response.

This is a problem on the integration side; you mentioned the generic azure.eventhub integration is mapping azure.eventhub as a keyword instead of long.

I created the GH issue #12400 so we can follow this problem there.

@zmoog
Copy link
Contributor Author

zmoog commented Jan 20, 2025

After reviewing the issues tracked here, we addressed all problems, or they have dedicated tracking issues.

Please feel free to reopen and mention me if you see something that needs attention.

@zmoog zmoog closed this as completed Jan 20, 2025
harnish-elastic pushed a commit to harnish-elastic/integrations that referenced this issue Feb 4, 2025
)

Align `client|source.geo.location` fields to ECS mapping.

Users reported mapping exceptions due to Elasticsearch mapping the `client|source.geo.location` fields as `object` instead of `geo_point`. See elastic#10848 for more.
harnish-elastic pushed a commit to harnish-elastic/integrations that referenced this issue Feb 4, 2025
Add ECS mapping for the `host.os.version` field (`keyword` type).

Users reported mapping exceptions due to `host.os.version` numeric values causing field mapping as `float` instead of `long`. See elastic#10848 (comment) for more.

Elasticsearch maps a field as a `float` if it has a numeric value. This happens even on stack versions 8.13+ because ecs@mappings does not perform type coercion. For example, Elasticsearch maps `7.9` as `float`, while it maps `"7.9" or "7.9 (Maipo)"` as `keyword`.

By adding the `host.os.version` field to the file `fields/ecs.yml`, we ensure Elasticsearch uses the expected ECS field mapping as a `keyword` even when the value is a number. 

IMPORTANT: To fully resolve the issue, the input/integration owner should update it to emit the right value type to leverage ecs@mappings.
harnish-elastic pushed a commit to harnish-elastic/integrations that referenced this issue Feb 4, 2025
The problem is that the `ecs@mappings` component template (introduced in >=8.13.0) 
does not perform type coercion to long when the value was a string. In this 
specific integration scenario, the `event.severity` value was provided as a 
string. So the dynamic mapping never matched, which left `event.severity` with 
the default 'keyword' mapping type, which does not comply with ECS.

This change adds back the static mapping for `event.severity` and modifies the 
grok pattern to perform conversion to long.

Relates: elastic#10848

---------

Co-authored-by: Andrew Kroh <andrew.kroh@elastic.co>
qcorporation pushed a commit that referenced this issue Feb 4, 2025
The problem is that the `ecs@mappings` component template (introduced in >=8.13.0) 
does not perform type coercion to long when the value was a string. In this 
specific integration scenario, the `event.severity` value was provided as a 
string. So the dynamic mapping never matched, which left `event.severity` with 
the default 'keyword' mapping type, which does not comply with ECS.

This change adds back the static mapping for `event.severity` and modifies the 
grok pattern to perform conversion to long.

Relates: #10848

---------

Co-authored-by: Andrew Kroh <andrew.kroh@elastic.co>
harnish-elastic pushed a commit to harnish-elastic/integrations that referenced this issue Feb 5, 2025
)

Align `client|source.geo.location` fields to ECS mapping.

Users reported mapping exceptions due to Elasticsearch mapping the `client|source.geo.location` fields as `object` instead of `geo_point`. See elastic#10848 for more.
harnish-elastic pushed a commit to harnish-elastic/integrations that referenced this issue Feb 5, 2025
Add ECS mapping for the `host.os.version` field (`keyword` type).

Users reported mapping exceptions due to `host.os.version` numeric values causing field mapping as `float` instead of `long`. See elastic#10848 (comment) for more.

Elasticsearch maps a field as a `float` if it has a numeric value. This happens even on stack versions 8.13+ because ecs@mappings does not perform type coercion. For example, Elasticsearch maps `7.9` as `float`, while it maps `"7.9" or "7.9 (Maipo)"` as `keyword`.

By adding the `host.os.version` field to the file `fields/ecs.yml`, we ensure Elasticsearch uses the expected ECS field mapping as a `keyword` even when the value is a number. 

IMPORTANT: To fully resolve the issue, the input/integration owner should update it to emit the right value type to leverage ecs@mappings.
harnish-elastic pushed a commit to harnish-elastic/integrations that referenced this issue Feb 5, 2025
The problem is that the `ecs@mappings` component template (introduced in >=8.13.0) 
does not perform type coercion to long when the value was a string. In this 
specific integration scenario, the `event.severity` value was provided as a 
string. So the dynamic mapping never matched, which left `event.severity` with 
the default 'keyword' mapping type, which does not comply with ECS.

This change adds back the static mapping for `event.severity` and modifies the 
grok pattern to perform conversion to long.

Relates: elastic#10848

---------

Co-authored-by: Andrew Kroh <andrew.kroh@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working, use only for issues Integration:All Applies to all integrations [Integration not found in source]
Projects
None yet
Development

No branches or pull requests

4 participants