-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EEM] Remove duplicates from latest data set #187699
[EEM] Remove duplicates from latest data set #187699
Conversation
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
...y_solution/entity_manager/server/lib/entities/ingest_pipeline/generate_history_processors.ts
Outdated
Show resolved
Hide resolved
...ty_solution/entity_manager/server/lib/entities/ingest_pipeline/generate_latest_processors.ts
Outdated
Show resolved
Hide resolved
...rvability_solution/entity_manager/server/lib/entities/transform/generate_latest_transform.ts
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the right direction
can we also remove displayName
from the history docs (since it should only use one (and maybe not the one in the current doc) identityField), and add it to the latest docs
...y_solution/entity_manager/server/lib/entities/ingest_pipeline/generate_history_processors.ts
Show resolved
Hide resolved
...y_solution/entity_manager/server/lib/entities/ingest_pipeline/generate_history_processors.ts
Outdated
Show resolved
Hide resolved
...rvability_solution/entity_manager/server/lib/entities/transform/generate_latest_transform.ts
Outdated
Show resolved
Hide resolved
...rvability_solution/entity_manager/server/lib/entities/transform/generate_latest_transform.ts
Outdated
Show resolved
Hide resolved
...y_solution/entity_manager/server/lib/entities/ingest_pipeline/generate_history_processors.ts
Outdated
Show resolved
Hide resolved
I'm going to split this PR in two:
|
74b1e91
to
e5c3f8b
Compare
@tommyers-elastic Are we sure we won't need the display name for the history documents? In the UI I guess not because we'll likely enter the history from the latest, so we can hang on to the display name from there. But what about if there are more than one value found in the latest transform, I'm still not sure how to handle that. Also, this PR is ready for review again! |
@miltonhultgren the reason for displayName in latest only is because of: say you have an entity definition with two fields that both contain the user identifier, so you just put a display name with one of the fields like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - but do we need to also update the component templates?
the changes i think we need are to remove displayName
from the base template, and include it only in the latest
; and to map identityFields
as a keyword in the base template.
(while we're at it, we should remove firstSeenTimestamp
from the shared mapping too, and include that alongside displayName
in the latest template only)
}, | ||
}, | ||
{ | ||
// This must happen AFTER we lift the identity fields into the root of the document |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
💚 Build Succeeded
Metrics [docs]
History
|
expect(initializePathScript('someField')).toMatchInlineSnapshot(` | ||
" | ||
|
||
if (ctx.someField == null) { | ||
ctx.someField = new HashMap(); | ||
} | ||
" | ||
`); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's a single level field like tags
, you don't need to initialize the new HashMap();
When you refactored this code you missed where it tests the currentIndex + 1 === parts.length
. If parts.length
is 1
and the currentIndex
is 0
then you're already at the end and you can just assign the value, there is no need to instantiate a new HashMap()
.
|
||
|
||
if (ctx.some.nested.field == null) { | ||
ctx.some.nested.field = new HashMap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're at the end of the path, there is no need for this.
"entity.identity.event.category": Object { | ||
"terms": Object { | ||
"field": "event.category", | ||
"size": 1, | ||
}, | ||
}, | ||
"entity.identity.log.logger": Object { | ||
"terms": Object { | ||
"field": "log.logger", | ||
"size": 1, | ||
}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these are single value fields, I wonder if we shouldn't have used top_metric
and then used the same set
processor as the history?
By only grouping on
entity.id
we should be able to remove duplicates in the latest indices.This PR also removes the values found for
entity.identityFields
and replaces it with a list of those field names.This PR also lifts the values for the identity fields to the root of the document.
This PR removes the
displayName
from the historical documents.How to test
Source data:
Entity definition:
Change in the format of the resulting documents
=>