-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure template exists when creating data stream #56888
Conversation
Limit the creation of data streams only for namespaces that have a composable template with a data stream definition. This way we ensure that mappings/settings have been specified and will be used at data stream creation and data stream rollover. Relates to elastic#53100
Pinging @elastic/es-core-features (:Core/Features/Data streams) |
if (indexTemplateV2.getDataStreamTemplate() == null) { | ||
throw new IllegalArgumentException("matching index template [" + v2Template + "] for data stream [" + dataStreamName + | ||
"] has no data stream template"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we enforce that the timestamp field on the create data stream request is the same as the timestamp field in the data stream template? This can be different if a user provided a different timestamp field via the create data stream api.
If we do enforce this then I think that we should remove the timestamp field from the create data stream request and always use the timestamp field provided in the data stream template inside a itv2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should at least make specifying the timestamp field non-mandatory. In the future, it should only be necessary to specify if there is no matching template.
Also, the requirement to have the template have a valid mapping for the timestamp field makes it questionable if specifying a timestamp field on the data stream has any real value. Once/if we add the ability to also specify settings and mappings when creating data streams (i.e., create a data stream without any template requirement), it makes more sense to also be able to specify the timestamp field.
I would opt to remove the ability to specify the timestamp field for the first iteration of data streams.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Henning, I will remove the timestamp_field parameter from the create data stream request as part of this PR.
@martijnvg so with this change, it's possible that a datastream that was working can stop working (rolling over) if the index template is deleted. Do you think we should add a warning (and/or an error) when deleting an index template? We could enumerate the datastreams and see if deleting the index meant that one or more datastreams would no longer have a matching template, and warn/error if that is the case. Curious what your thoughts are here |
let the create data stream api resolve the timestamp field from the data stream template snippet inside an itv2.
@dakrone Good point. I think we would want to throw an error if an index template is 'used' by a data stream and it about the be deleted. However I think this change should be as a follow up in order to keep this pr small. |
I've updated this PR and removed timestamp_field from create data stream request and let the create data stream api always resolve the timestamp field from the data stream template snippet inside an itv2. |
Totally, I don't think it's part of this at all, I was thinking of it as a follow-up as well (doesn't have to be you that does it either). Can you open an issue so we don't forget though? |
I've opened: #57004 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. I added a number of smaller comments.
|
||
//// | ||
[source,console] | ||
----------------------------------- | ||
DELETE /_data_stream/my-data-stream | ||
DELETE /_index_template/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to just delete the specific template?
-------------------------------------------------- | ||
PUT _data_stream/my-data-stream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The text further down explains that there is a timestamp field in the body, I think that should be removed now?
Maybe we also need a short description stating that you can only create data-streams into namespaces where a template exists with a data_stream
declaration on it? Maybe this is special enough that we should include creating the template into the text too, in order to ensure that the first thing users try out with data-streams works?
//// | ||
[source,console] | ||
----------------------------------- | ||
DELETE /_index_template/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to just delete the specific template.
@@ -21,6 +26,7 @@ PUT /_data_stream/my-data-stream | |||
[source,console] | |||
----------------------------------- | |||
DELETE /_data_stream/my-data-stream | |||
DELETE /_index_template/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more...
@@ -225,12 +225,23 @@ The API returns the following response: | |||
[[rollover-data-stream-ex]] | |||
===== Roll over a data stream | |||
|
|||
//// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should include creating the template into the text in order to make the example more complete?
allowed_warnings: | ||
- "index template [my-template2] has index patterns [simple-data-stream2*] matching patterns from existing older templates [global] with patterns (global => [*]); this template [my-template2] will take precedence during new index creation" | ||
indices.put_index_template: | ||
name: my-template2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a test verifying the behavior when attempting to create a data-stream into a name where there is no template and also into a name where the template is not a data-stream?
version: " - 7.8.99" | ||
reason: "data streams only supported in 7.9+" | ||
version: " - 7.99.99" | ||
reason: "mute bwc until backported" | ||
|
||
- do: | ||
catch: bad_request | ||
indices.create_data_stream: | ||
name: invalid-data-stream#-name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should use a name that matches one of the templates (but is still illegal) to ensure that the error is really due to the name check and not the template/data-stream check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check whether a template with the same exists happens before validating the name of the template. I can check the error description, so that we ensure that this request fails because of an invalid name.
Settings settings = Settings.builder().put(IndexMetadata.SETTING_NUMBER_OF_REPLICAS, 0).build(); | ||
|
||
PutIndexTemplateV2Action.Request createTemplateRequest = new PutIndexTemplateV2Action.Request("logs-foo"); | ||
createTemplateRequest.indexTemplate( | ||
new IndexTemplateV2( | ||
List.of("logs-foo*"), | ||
new Template(settings, null, null), | ||
null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes settings
unused? But I am also uncertain of why this change was made? Just cleanup or a necessity. If the latter, I would like to understand why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just a cleanup. I will remove the settings variable.
(copy paste from testAutoCreateV1TemplateNoDataStream()
which ensures that a template applied, but that is not needed here, because if the data stream is created then the template has been applied)
@@ -72,13 +77,19 @@ | |||
|
|||
public class DataStreamIT extends ESIntegTestCase { | |||
|
|||
@After |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, but I wonder if we should follow-up with generically removing all v2 templates like we do for regular templates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I noticed that component templates were not automatically removed after each test. I don't know why. @dakrone Is there a reason why component templates are automatically removed after each test?
Metadata.Builder builder = Metadata.builder(); | ||
builder.put("_id", template); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: seems like a strange secret choice of _id
, maybe just use template
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little concerned that we're moving away from an easy-to-use abstraction by requiring a matching template before a data stream can be created or rolled over. In the simplest case, it's a composable template that adds little value by specifying only the name of the timestamp field. I haven't tried it, but I also wonder what would happen on rollover if the timestamp_field
was changed in the template. I think it would be preferable if that were an immutable property on a data stream.
Also added a few minor notes on some of the documentation changes.
A data stream can only be created if the namespace it targets has a component | ||
template exists with a `data_stream` definition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: ...has a component template that exists...
or ...has an existing component template...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, are we calling these composable templates now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A component template cannot have a data_stream definition, only an index template can, I would recommend changing it to:
"if the namespace it targets has a composable index template with a data_stream
definition."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops, I mixed the terms composable and component up in my head... I meant composable.
`timestamp_field`:: | ||
(Required, string) The name of the timestamp field. This field must be present | ||
in all documents indexed into the data stream and must be of type | ||
<<date, `date`>> or <<date_nanos, `date_nanos`>>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this documentation on timestamp_field
should be preserved somewhere else, perhaps in the documentation on creating composable templates.
Once a data stream is created from a composable template then that data stream keeps using the timestamp_field that was specified at the time the data stream was created. If the timestamp_field property is changed in a composable template then that change would only apply for new data streams that are created after that change. The data streams already created keep using the same timestamp_field property, also when doing a rollover.
I think there is a different issue and that has to do with the mapping of the timestamp_field. A rollover only uses the mappings and settings when doing a rollover to create a new backing index. If the mapping of the timestamp_field has changed then this could be problematic. For example the mapping of a timestamp_field may no longer exist for previous created data streams. I think in order to prevent this from happening is adding validation to the put composable template api that prevents changing the mapping for a field, if that field is used as timestamp_field in existing data streams. We're going to have to add more validation to the put composable api. For example, timestamp field validation that #56815 tries to add or prevent removing a composable template that is used by a data stream (#57004). |
Something else that crossed my mind is if we make the mapping of a timestamp field part of the So currently someone would need to configure something like this:
This requires that we check that the
When a new data stream is created from a template then it will carry over this timestamp_field config over from the template's data stream definition. This is immutable and rollover and create data stream logic can use that to always set the timestamp_field's mapping to what has been configured in the data steam definition. |
@elasticmachine run elasticsearch-ci/1 |
That sounds good to me. |
LGTM pending successful CI. |
Backporting elastic#56888 to 7.x branch. Limit the creation of data streams only for namespaces that have a composable template with a data stream definition. This way we ensure that mappings/settings have been specified and will be used at data stream creation and data stream rollover. Also remove `timestamp_field` parameter from create data stream request and let the create data stream api resolve the timestamp field from the data stream definition snippet inside a composable template. Relates to elastic#53100
Backporting #56888 to 7.x branch. Limit the creation of data streams only for namespaces that have a composable template with a data stream definition. This way we ensure that mappings/settings have been specified and will be used at data stream creation and data stream rollover. Also remove `timestamp_field` parameter from create data stream request and let the create data stream api resolve the timestamp field from the data stream definition snippet inside a composable template. Relates to #53100
Limit the creation of data streams only for namespaces that have a composable template with a data stream definition.
This way we ensure that mappings/settings have been specified and will be used at data stream creation and data stream rollover.
Relates to #53100