Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ILM policy setting to data_stream manifest file #90

Merged
merged 12 commits into from
Dec 8, 2020

Conversation

nnamdifrankie
Copy link
Contributor

@nnamdifrankie nnamdifrankie commented Dec 3, 2020

Issue: https://github.com/elastic/security-team/issues/548

This PR adds the ilm_policy fields to data stream manifest file. The ilm_policy is required to specify an alternate ilm policy to override the policy specified by the type field e.g. logs, metrics.

ilm_policy field

This feature is needed so that we can get more flexibility when defining data streams within our packages, specifically when it comes to ilm_policy. Currently, the ilm_policy is tied to the type of the data stream. We want to add a field where we can specify an ilm_policy to override this behavior. There will be a subsequent PR to the Fleet code that will install the components according to this new ilm_policy field.

Our specific use case that is driving this is to define a data stream that is meant to collect documents that are used for diagnostics/telemetry. There will be a separate telemetry server that will query this particular data stream and collect the documents. As a result, we want the ilm_policy to be fairly aggressive as we do not need the documents to stay in storage for very long.

  • add properties ilm_policy and hidden fields to the datastream manifest.

@elasticmachine
Copy link

elasticmachine commented Dec 3, 2020

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #90 updated

  • Start Time: 2020-12-08T20:21:54.706+0000

  • Duration: 2 min 28 sec

@nnamdifrankie
Copy link
Contributor Author

@ycombinator @mtojek I am getting a build failure, the check and test works locally, what am I missing?

@nnamdifrankie nnamdifrankie requested a review from mtojek December 3, 2020 20:10
Copy link
Contributor

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you look at the CI output? I failed with an actionable error:

[2020-12-03T20:05:37.508Z] Specs are not up to date. Please run make update.

Run make update to generate Go code with embedded schema.

examples:
- diagnotics
hidden:
description: Specifies if a datastream is hidden
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hidden where?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in elastic search

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what @mtojek is suggesting is to include this clarification as part of the description.

Suggested change
description: Specifies if a datastream is hidden
description: Specifies if a datastream is hidden in Elasticsearch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we definitely need more context here. Could you please justify why does the ILM policy need to be hidden?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hide the data stream from views

Is it hidden in Elasticsearch or Kibana? I'm afraid I'm a bit lost here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtojek this field will correspond with this change in ES to give the option to make a data stream hidden. elastic/elasticsearch#63987

Our particular use case will be to define a data stream that will collect documents for the sole purpose of telemetry. As a result, we want the data stream to be hidden to reduce user confusion.

@@ -87,6 +87,14 @@ spec:
- beta
examples:
- beta
ilm_policy:
description: The name of an existing ILM (Index Lifecycle Management) policy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a part of the package (is it an ordinary file)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it is under elastic/ilm

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what Marcin is getting at here: Do we expect here that it only references a file which is under data_stream/*/elastic/ilm_policy/* or can it be any name? This is relevant for validation.

I think we also need to have a conversation around how we name these ILM policies which are specific to a data_stream. Suggestions?

Last, will the same ILM policy apply for all data_streams of this type?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what Marcin is getting at here: Do we expect here that it only references a file which is under data_stream//elastic/ilm_policy/ or can it be any name? This is relevant for validation.

I could see it both ways. I'm thinking it's better to start with the more strict validation (the ilm_policy is provided in the same data stream definition) and relax it later if we see the need for multiple to reference the same file. But I don't know if that adds to the complexity.

Last, will the same ILM policy apply for all data_streams of this type?

In this particular use case, I don't really forsee having more than the default namespace, so for simplicity, I think it's ok to apply it to all to start with?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

An other interesting discussion we had in a meeting is that the specific ILM policies for a data_stream should be overwritten on upgrade in contrast to the ILM policies we load on the global level.

@@ -87,6 +87,14 @@ spec:
- beta
examples:
- beta
ilm_policy:
description: The name of an existing ILM (Index Lifecycle Management) policy
type: string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is a string property, does it need a pattern definition here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no I do not believe it needs, it is the name of the ilm policy stored which is equivalent to a valid filename. I think this is guard enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We introduced a strict validation for files and directories. If the ilm policy is part of the package, it's stored in some directory. You need to update/define a spec for that directory as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a custom validation that needs to be written or it exist today?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already defined, e.g.: https://github.com/elastic/package-spec/blob/master/versions/1/data_stream/spec.yml

Please review also the CONTRIBUTING doc.

description: The name of an existing ILM (Index Lifecycle Management) policy
type: string
examples:
- diagnotics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit typo

@@ -0,0 +1,7 @@
title: Hidden datastream and ilm policy overrride
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct spelling is data stream not datastream. Please adjust the entire PR.

@nnamdifrankie
Copy link
Contributor Author

Run make update to generate Go code with embedded schema.

Yes I noticed that I needed to check in the spec update.

@ruflin ruflin changed the title add-new-properties-for-manifest Add ILM policy setting to data_stream manifest file Dec 4, 2020
@ruflin
Copy link
Collaborator

ruflin commented Dec 4, 2020

@nnamdifrankie I updated the title of this PR a bit to make it more descriptive. It will also need a changelog entry.

examples:
- diagnostics
hidden:
description: Specifies if a data stream is hidden in ElasticSearch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Elasticsearch

examples:
- diagnostics
hidden:
description: Specifies if a data stream is hidden in ElasticSearch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: Specifies if a data stream is hidden in ElasticSearch
description: Specifies if a data stream is hidden in Elasticsearch

Copy link
Collaborator

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see this moving. I see we mix two spec changes here into 1. hidden and ilm_policy. Any chance we could make 2 PR's out of this? I'm aware that on the usage side they are related but I would prefer to get them in 1-1.

Especially the hidden should be quick and we can roll it out. for ilm we might have a few more discussions so we can decouple it.

@nnamdifrankie
Copy link
Contributor Author

nnamdifrankie commented Dec 4, 2020

@ycombinator @mtojek Can you take another look? I want to see if we can release the elastic-package too as this is used in our build to lint the package. And I think it is also used in the registry to check for bad packages.

@ycombinator
Copy link
Contributor

ycombinator commented Dec 7, 2020

Hey @nnamdifrankie, could you please add a bit of context to this PR's description - most importantly, why this change is needed / what use case is it for? That would help me quite a bit in understanding what's going on in this PR. Or if there's an issue where all of this already described, could you please link to it from the PR description? Thanks!

[EDIT] Also, I'm ++ to @ruflin's suggestion about to split this PR into two: one for hidden and one for ilm_policy. The hidden change seems pretty straightforward so let's try to get that in first while we discuss the ilm_policy change in a separate PR. Thanks!

Copy link
Contributor

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Shaunak already mentioned, this PR description is a big vague. Let's try to add more context to clarify why do users need this feature. Maybe also describe a user scenario?

examples:
- diagnotics
hidden:
description: Specifies if a datastream is hidden
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we definitely need more context here. Could you please justify why does the ILM policy need to be hidden?

@@ -0,0 +1,7 @@
title: Hidden data stream and ilm policy overrride
type: metrics
ilm_policy: diagnostics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding test case. If I understand the idea correctly, the diagnostics should be present in this package somewhere? If so, we can try to enable validation for the property (mentioned in other comment).

Copy link

@kevinlog kevinlog Dec 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtojek the diagnostics is referring to a use case for a particular ilm_policy that we want to create.

Our use case is to enable ourselves to add a custom ilm_policy for a data stream in a package. We then want to define that ilm_policy to be fairly aggressive for a data stream that is intended to be for documents used for diagnostics/telemetry. A separate telemetry server will query for the docs stored in this data stream.

Let me know if that helps answer your question. I'm happy to explain further

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kevin for the explanation! It's more clear now. Just a follow-up question: based on this example (diagnostics) do you plan to include it in the package or it will be preinstalled/predefined/distributed with ES or Kibana? If it's distributed together with the package, then we can consider enabling a validation for the field (I'm fine with leaving it for later improvement).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtojek we'll include it with the package. As such, I think we should eventually add a validation for it. We see it as fairly similar to the ingest_pipeline. I opened a follow up ticket for adding that validation - #92 (I can fill it out with more details)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM!

@kevinlog
Copy link

kevinlog commented Dec 7, 2020

@mtojek @ycombinator thanks for the feedback. I updated the description to shed some light on the PR. We can still split the fields out to a separate PR.

@nnamdifrankie could you take the hidden changes and open up that PR separately from this one? You could copy over the description specific to the hidden field.

FYI @ruflin @tsg

@nnamdifrankie
Copy link
Contributor Author

@mtojek @ycombinator @kevinlog please find the new PR #91

@mtojek mtojek self-requested a review December 7, 2020 20:06
Copy link
Contributor

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtojek @ycombinator thanks for the feedback. I updated the description to shed some light on the PR. We can still split the fields out to a separate PR.

Thanks! Splitting this PR into two separate ones is a good idea. I approved the hidden part.

@nnamdifrankie could you take the hidden changes and open up that PR separately from this one? You could copy over the description specific to the hidden field.

Please also update the PR description in this PR (remove the paragraph about hidden).

Speaking about next steps after PRs to package-spec are merged, I think you can come back to the contribution to the package-registry.

EDIT:

Keep in mind that once the other PR is merged, you need to rebase this one and update Go code to include both changes (make update).

# Conflicts:
#	code/go/internal/spec/statik.go
#	versions/1/changelog.yml
@nnamdifrankie
Copy link
Contributor Author

@mtojek @kevinlog I have updated the PR

@mtojek
Copy link
Contributor

mtojek commented Dec 8, 2020

@nnamdifrankie Nice! I think the only missing part is the sample ILM policy in the good package (test). We talked about it here: https://github.com/elastic/package-spec/tree/add-new-manifest-properties/code/go/internal/validator/test/packages/good/data_stream/ilm_policy

Would you mind adding a sample one? I would be a ground for the validation issue (see: #92).

EDIT:

Heads up, as there is another PR open (#93). If it got merged first, you'll have to adjust your PR also against it.

@nnamdifrankie
Copy link
Contributor Author

@mtojek @kevinlog I have added the tests and updated the spec to allow for ilm under elasticsearch

Copy link
Contributor

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only small nit-picks, but in general LGTM. Thanks for all adjustments and tests!

name: ilm
additionalContents: false
contents:
- description: Supporting ilm policy definitions in YAML
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please make sure you're using consistent abbrs in terms of capital letters (ilm vs ILM vs Ilm)

@@ -0,0 +1,15 @@
{
"policy": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can provide the schema for this file. I don't know if Elasticsearch team plans to extend this file in the future or if it's rather never-changing, concise schema.

We haven't defined the schema for ingest pipelines, so I'm to leave it as it (without schema definition for ILM policy).

@nnamdifrankie
Copy link
Contributor Author

nnamdifrankie commented Dec 8, 2020

@mtojek Thanks. How does this get into the elastic package tool github.com/elastic/elastic-package?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants