-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ILM policy setting to data_stream manifest file #90
Conversation
@ycombinator @mtojek I am getting a build failure, the check and test works locally, what am I missing? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you look at the CI output? I failed with an actionable error:
[2020-12-03T20:05:37.508Z] Specs are not up to date. Please run make update.
Run make update
to generate Go code with embedded schema.
examples: | ||
- diagnotics | ||
hidden: | ||
description: Specifies if a datastream is hidden |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hidden where?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in elastic search
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what @mtojek is suggesting is to include this clarification as part of the description
.
description: Specifies if a datastream is hidden | |
description: Specifies if a datastream is hidden in Elasticsearch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we definitely need more context here. Could you please justify why does the ILM policy need to be hidden?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hide the data stream from views
Is it hidden in Elasticsearch or Kibana? I'm afraid I'm a bit lost here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mtojek this field will correspond with this change in ES to give the option to make a data stream
hidden. elastic/elasticsearch#63987
Our particular use case will be to define a data stream
that will collect documents for the sole purpose of telemetry. As a result, we want the data stream
to be hidden to reduce user confusion.
@@ -87,6 +87,14 @@ spec: | |||
- beta | |||
examples: | |||
- beta | |||
ilm_policy: | |||
description: The name of an existing ILM (Index Lifecycle Management) policy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a part of the package (is it an ordinary file)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it is under elastic/ilm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what Marcin is getting at here: Do we expect here that it only references a file which is under data_stream/*/elastic/ilm_policy/*
or can it be any name? This is relevant for validation.
I think we also need to have a conversation around how we name these ILM policies which are specific to a data_stream. Suggestions?
Last, will the same ILM policy apply for all data_streams of this type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what Marcin is getting at here: Do we expect here that it only references a file which is under data_stream//elastic/ilm_policy/ or can it be any name? This is relevant for validation.
I could see it both ways. I'm thinking it's better to start with the more strict validation (the ilm_policy is provided in the same data stream definition) and relax it later if we see the need for multiple to reference the same file. But I don't know if that adds to the complexity.
Last, will the same ILM policy apply for all data_streams of this type?
In this particular use case, I don't really forsee having more than the default
namespace, so for simplicity, I think it's ok to apply it to all to start with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
An other interesting discussion we had in a meeting is that the specific ILM policies for a data_stream should be overwritten on upgrade in contrast to the ILM policies we load on the global level.
@@ -87,6 +87,14 @@ spec: | |||
- beta | |||
examples: | |||
- beta | |||
ilm_policy: | |||
description: The name of an existing ILM (Index Lifecycle Management) policy | |||
type: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is a string property, does it need a pattern definition here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no I do not believe it needs, it is the name of the ilm policy stored which is equivalent to a valid filename. I think this is guard enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We introduced a strict validation for files and directories. If the ilm policy is part of the package, it's stored in some directory. You need to update/define a spec for that directory as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a custom validation that needs to be written or it exist today?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already defined, e.g.: https://github.com/elastic/package-spec/blob/master/versions/1/data_stream/spec.yml
Please review also the CONTRIBUTING doc.
description: The name of an existing ILM (Index Lifecycle Management) policy | ||
type: string | ||
examples: | ||
- diagnotics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit typo
@@ -0,0 +1,7 @@ | |||
title: Hidden datastream and ilm policy overrride |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The correct spelling is data stream
not datastream
. Please adjust the entire PR.
Yes I noticed that I needed to check in the spec update. |
@nnamdifrankie I updated the title of this PR a bit to make it more descriptive. It will also need a changelog entry. |
examples: | ||
- diagnostics | ||
hidden: | ||
description: Specifies if a data stream is hidden in ElasticSearch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Elasticsearch
examples: | ||
- diagnostics | ||
hidden: | ||
description: Specifies if a data stream is hidden in ElasticSearch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
description: Specifies if a data stream is hidden in ElasticSearch | |
description: Specifies if a data stream is hidden in Elasticsearch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to see this moving. I see we mix two spec changes here into 1. hidden and ilm_policy. Any chance we could make 2 PR's out of this? I'm aware that on the usage side they are related but I would prefer to get them in 1-1.
Especially the hidden
should be quick and we can roll it out. for ilm we might have a few more discussions so we can decouple it.
@ycombinator @mtojek Can you take another look? I want to see if we can release the elastic-package too as this is used in our build to lint the package. And I think it is also used in the registry to check for bad packages. |
Hey @nnamdifrankie, could you please add a bit of context to this PR's description - most importantly, why this change is needed / what use case is it for? That would help me quite a bit in understanding what's going on in this PR. Or if there's an issue where all of this already described, could you please link to it from the PR description? Thanks! [EDIT] Also, I'm ++ to @ruflin's suggestion about to split this PR into two: one for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Shaunak already mentioned, this PR description is a big vague. Let's try to add more context to clarify why do users need this feature. Maybe also describe a user scenario?
examples: | ||
- diagnotics | ||
hidden: | ||
description: Specifies if a datastream is hidden |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we definitely need more context here. Could you please justify why does the ILM policy need to be hidden?
@@ -0,0 +1,7 @@ | |||
title: Hidden data stream and ilm policy overrride | |||
type: metrics | |||
ilm_policy: diagnostics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding test case. If I understand the idea correctly, the diagnostics should be present in this package somewhere? If so, we can try to enable validation for the property (mentioned in other comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mtojek the diagnostics is referring to a use case for a particular ilm_policy
that we want to create.
Our use case is to enable ourselves to add a custom ilm_policy
for a data stream
in a package. We then want to define that ilm_policy
to be fairly aggressive for a data stream
that is intended to be for documents used for diagnostics/telemetry. A separate telemetry server will query for the docs stored in this data stream
.
Let me know if that helps answer your question. I'm happy to explain further
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Kevin for the explanation! It's more clear now. Just a follow-up question: based on this example (diagnostics) do you plan to include it in the package or it will be preinstalled/predefined/distributed with ES or Kibana? If it's distributed together with the package, then we can consider enabling a validation for the field (I'm fine with leaving it for later improvement).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM!
@mtojek @ycombinator thanks for the feedback. I updated the description to shed some light on the PR. We can still split the fields out to a separate PR. @nnamdifrankie could you take the |
@mtojek @ycombinator @kevinlog please find the new PR #91 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mtojek @ycombinator thanks for the feedback. I updated the description to shed some light on the PR. We can still split the fields out to a separate PR.
Thanks! Splitting this PR into two separate ones is a good idea. I approved the hidden
part.
@nnamdifrankie could you take the hidden changes and open up that PR separately from this one? You could copy over the description specific to the hidden field.
Please also update the PR description in this PR (remove the paragraph about hidden
).
Speaking about next steps after PRs to package-spec are merged, I think you can come back to the contribution to the package-registry.
EDIT:
Keep in mind that once the other PR is merged, you need to rebase this one and update Go code to include both changes (make update
).
# Conflicts: # code/go/internal/spec/statik.go # versions/1/changelog.yml
@nnamdifrankie Nice! I think the only missing part is the sample ILM policy in the Would you mind adding a sample one? I would be a ground for the validation issue (see: #92). EDIT: Heads up, as there is another PR open (#93). If it got merged first, you'll have to adjust your PR also against it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only small nit-picks, but in general LGTM. Thanks for all adjustments and tests!
versions/1/data_stream/spec.yml
Outdated
name: ilm | ||
additionalContents: false | ||
contents: | ||
- description: Supporting ilm policy definitions in YAML |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please make sure you're using consistent abbrs in terms of capital letters (ilm vs ILM vs Ilm)
@@ -0,0 +1,15 @@ | |||
{ | |||
"policy": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can provide the schema for this file. I don't know if Elasticsearch team plans to extend this file in the future or if it's rather never-changing, concise schema.
We haven't defined the schema for ingest pipelines, so I'm to leave it as it (without schema definition for ILM policy).
@mtojek Thanks. How does this get into the elastic package tool github.com/elastic/elastic-package? |
Issue: https://github.com/elastic/security-team/issues/548
This PR adds the ilm_policy fields to data stream manifest file. The ilm_policy is required to specify an alternate ilm policy to override the policy specified by the type field e.g. logs, metrics.
ilm_policy
fieldThis feature is needed so that we can get more flexibility when defining
data streams
within our packages, specifically when it comes toilm_policy
. Currently, theilm_policy
is tied to thetype
of thedata stream
. We want to add a field where we can specify anilm_policy
to override this behavior. There will be a subsequent PR to the Fleet code that will install the components according to this newilm_policy
field.Our specific use case that is driving this is to define a
data stream
that is meant to collect documents that are used for diagnostics/telemetry. There will be a separate telemetry server that will query this particulardata stream
and collect the documents. As a result, we want theilm_policy
to be fairly aggressive as we do not need the documents to stay in storage for very long.