Protect replicated data streams against local rollovers #64710

martijnvg · 2020-11-06T13:45:29Z

When a data stream is being auto followed then a rollover in a local cluster can break auto following,
if the local cluster performs a rollover then it creates a new write index and if then later the remote
cluster rolls over as well then that new write index can't be replicated, because it has the same name
as in the write index in the local cluster, which was created earlier.

If a data stream is managed by ccr, then the local cluster should not do a rollover for those data streams.
The data stream should be rolled over in the remote cluster and that change should replicate to the local
cluster. Performing a rollover in the local cluster is an operation that the data stream support in ccr should
perform.

To protect against rolling over a replicated data stream, this PR adds a replicate field to DataStream class.
The rollover api will fail with an error in case a data stream is being rolled over and the targeted data stream is
a replicated data stream. When the put follow api creates a data stream in the local cluster then the replicate flag
is set to true. There should be a way to turn a replicated data stream into a regular data stream when for example
during disaster recovery. The newly added api in this pr (promote data stream api) is doing that. After a replicated
data stream is promoted to a regular data stream then the local data stream can be rolled over, so that the new
write index is no longer a follower index. Also if the put follow api is attempting to update this data stream
(for example to attempt to resume auto following) then that with fail, because the data stream is no longer a
replicated data stream.

Today with time based indices behind an alias, the is_write_index property isn't replicated from remote cluster
to the local cluster, so when attempting to rollover the alias in the local cluster the rollover fails, because the
alias doesn't have a write index. The added replicated field in the DataStream class and added validation
achieve the same kind of protection, but in a more robust way.

A followup from #61993.

(labelling this PR as non-issue, since data stream support for CCR hasn't been released yet)

…that follows a remote data stream fails.

martijnvg · 2020-11-06T13:48:17Z

...c/main/java/org/elasticsearch/xpack/datastreams/action/PromoteDataStreamTransportAction.java

+            throw new ResourceNotFoundException("data stream [" + request.getName() + "] does not exist");
+        }
+
+        DataStream promotedDataStream = dataStream.promoteDataStream();


Should we embed this in the unfollow api? I'm not sure, because that api is centered around unfollowing a regular index and this operation is different and that would add ambiguity to the unfollow api. However the argument can also be made in reverse, in that this is a related operation.

dnhatn · 2020-11-09T02:06:02Z

Thanks Martijn. I wonder how we use bi-directional replication with data streams. Today CCR does not require manual intervention for bi-directional replication user cases.

…_part_2

martijnvg · 2020-11-09T10:07:45Z

I wonder how we use bi-directional replication with data streams.

As far as I know, that doesn't require manual intervention as well. Auto following a data stream is similar to following an index pattern. Auto following a data stream also updates the data stream in the local cluster, when auto following an index pattern just IndexMetadata is added/updated.

…_part_2

… rolled over.

…_part_2

Relates #64710 Backport of #65747

dnhatn · 2020-12-07T03:42:14Z

x-pack/plugin/ccr/qa/multi-cluster/src/test/java/org/elasticsearch/xpack/ccr/AutoFollowIT.java

+        }
+    }
+
+    public void testDataStreamsBiDirectionalReplication() throws Exception {


@martijnvg Sorry, I think I didn't explain well my concern about supporting bi-directional replication. What I meant was JasonZ's blog. In his setup, we can send the same indexing request (uses the same write alias) to either cluster. In this test, we use different data streams for indexing requests. That means users can't simply reroute all indexing to a single cluster when another is not available.

Currently an alias can't point to a data stream or its backing indices. So the test can't completely follow the bi directional scenario. I did add a few logs-http* searches to mimic reading from a logs-http alias, as best effort replacement for the fact aliases can't point to data streams.

We're planning to add alias support to data streams. These aliases would only be able point to data streams and not to a data stream's backing indices, other indices or other aliases. Like aliases defined on indices, aliases on data stream could also have a write flag, which indicates to which data stream write requests are redirected to. I will add a TODO here, that these searches on logs-http* pattern should be replaced with searches and writes via aliases when alias support for data streams has landed.

We're planning to add alias support to data streams.

Thanks for explaning + adding the TODO.

@dnhatn I've opened: #66163

Thanks @martijnvg.

…_part_2

dnhatn · 2020-12-07T15:28:34Z

x-pack/plugin/src/test/resources/rest-api-spec/api/indices.promote_data_stream.json

+  "indices.promote_data_stream":{
+    "documentation":{
+      "url":"https://www.elastic.co/guide/en/elasticsearch/reference/master/data-streams.html",
+      "description":"Promotes a data stream from a replicate data stream managed by CCR to a regular data stream"


nit: s/replicate/replicated

dnhatn · 2020-12-07T15:30:00Z

x-pack/plugin/ccr/qa/multi-cluster/src/test/java/org/elasticsearch/xpack/ccr/AutoFollowIT.java

+        }
+    }
+
+    public void testDataStreamsBiDirectionalReplication() throws Exception {


We're planning to add alias support to data streams.

Thanks for explaning + adding the TODO.

dnhatn

LGTM. Thanks so much for all iterations @martijnvg!

…_part_2

Backporting elastic#64710 to the 7.x branch. When a data stream is being auto followed then a rollover in a local cluster can break auto following, if the local cluster performs a rollover then it creates a new write index and if then later the remote cluster rolls over as well then that new write index can't be replicated, because it has the same name as in the write index in the local cluster, which was created earlier. If a data stream is managed by ccr, then the local cluster should not do a rollover for those data streams. The data stream should be rolled over in the remote cluster and that change should replicate to the local cluster. Performing a rollover in the local cluster is an operation that the data stream support in ccr should perform. To protect against rolling over a replicated data stream, this PR adds a replicate field to DataStream class. The rollover api will fail with an error in case a data stream is being rolled over and the targeted data stream is a replicated data stream. When the put follow api creates a data stream in the local cluster then the replicate flag is set to true. There should be a way to turn a replicated data stream into a regular data stream when for example during disaster recovery. The newly added api in this pr (promote data stream api) is doing that. After a replicated data stream is promoted to a regular data stream then the local data stream can be rolled over, so that the new write index is no longer a follower index. Also if the put follow api is attempting to update this data stream (for example to attempt to resume auto following) then that with fail, because the data stream is no longer a replicated data stream. Today with time based indices behind an alias, the is_write_index property isn't replicated from remote cluster to the local cluster, so when attempting to rollover the alias in the local cluster the rollover fails, because the alias doesn't have a write index. The added replicated field in the DataStream class and added validation achieve the same kind of protection, but in a more robust way. A followup from elastic#61993.

Backporting #64710 to the 7.x branch. When a data stream is being auto followed then a rollover in a local cluster can break auto following, if the local cluster performs a rollover then it creates a new write index and if then later the remote cluster rolls over as well then that new write index can't be replicated, because it has the same name as in the write index in the local cluster, which was created earlier. If a data stream is managed by ccr, then the local cluster should not do a rollover for those data streams. The data stream should be rolled over in the remote cluster and that change should replicate to the local cluster. Performing a rollover in the local cluster is an operation that the data stream support in ccr should perform. To protect against rolling over a replicated data stream, this PR adds a replicate field to DataStream class. The rollover api will fail with an error in case a data stream is being rolled over and the targeted data stream is a replicated data stream. When the put follow api creates a data stream in the local cluster then the replicate flag is set to true. There should be a way to turn a replicated data stream into a regular data stream when for example during disaster recovery. The newly added api in this pr (promote data stream api) is doing that. After a replicated data stream is promoted to a regular data stream then the local data stream can be rolled over, so that the new write index is no longer a follower index. Also if the put follow api is attempting to update this data stream (for example to attempt to resume auto following) then that with fail, because the data stream is no longer a replicated data stream. Today with time based indices behind an alias, the is_write_index property isn't replicated from remote cluster to the local cluster, so when attempting to rollover the alias in the local cluster the rollover fails, because the alias doesn't have a write index. The added replicated field in the DataStream class and added validation achieve the same kind of protection, but in a more robust way. A followup from #61993

Relates to elastic#64710

Relates to #64710

Relates to elastic#64710

Backporting #66004 to 7.x branch. Relates to #64710

martijnvg added 2 commits November 6, 2020 12:20

Added test that fails now, but tests that rolling over a data stream …

f0d9076

…that follows a remote data stream fails.

Added replicate flag to data stream and promote data stream api.

10d7ffc

martijnvg added :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features :Data Management/Data streams Data streams and their lifecycles labels Nov 6, 2020

martijnvg requested a review from dnhatn November 6, 2020 13:45

martijnvg commented Nov 6, 2020

View reviewed changes

martijnvg added 3 commits November 9, 2020 10:59

Merge remote-tracking branch 'es/master' into ccr_data_stream_support…

5514ae8

…_part_2

fix precommit

461773d

nit

ef71529

martijnvg added 16 commits November 9, 2020 14:25

fixed tests

7237930

fixed tests

e9b6627

fixed npe

e031d72

Merge remote-tracking branch 'es/master' into ccr_data_stream_support…

b1c57d1

…_part_2

fixed test

0c22ce1

Merge remote-tracking branch 'es/master' into ccr_data_stream_support…

73877d6

…_part_2

Merge remote-tracking branch 'es/master' into ccr_data_stream_support…

5cbaf3d

…_part_2

Merge remote-tracking branch 'es/master' into ccr_data_stream_support…

c9d410f

…_part_2

added ccr bi-directional test with data streams

5efa943

Merge remote-tracking branch 'es/master' into ccr_data_stream_support…

312a788

…_part_2

added docs

56074c3

Merge remote-tracking branch 'es/master' into ccr_data_stream_support…

57941ac

…_part_2

Added a test, which verifies that an alias in follow cluster can't be…

1b49909

… rolled over.

Merge remote-tracking branch 'es/master' into ccr_data_stream_support…

5700b67

…_part_2

fix checkstyle

5528da4

added rest spec and renamed rest action

a21454e

martijnvg marked this pull request as ready for review November 23, 2020 10:48

elasticmachine added Team:Data Management Meta label for data/management team Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. labels Nov 23, 2020

dimitris-athanasiou added a commit that referenced this pull request Dec 2, 2020

[7.x][ML] Mute failing MlDistributedFailureIT test (#65747) (#65750)

a4cbf12

Relates #64710 Backport of #65747

dnhatn reviewed Dec 7, 2020

View reviewed changes

martijnvg added 4 commits December 7, 2020 10:03

Merge remote-tracking branch 'es/master' into ccr_data_stream_support…

5051eaf

…_part_2

added TODOs

676c7a8

varify

8e21fc7

mark promote ds api as non operator api

9c7a071

dnhatn reviewed Dec 7, 2020

View reviewed changes

dnhatn approved these changes Dec 7, 2020

View reviewed changes

martijnvg added 2 commits December 8, 2020 07:36

Merge remote-tracking branch 'es/master' into ccr_data_stream_support…

23c59b8

…_part_2

fixed typo

a80f777

martijnvg merged commit 52afaf2 into elastic:master Dec 8, 2020

martijnvg added the backport pending label Dec 8, 2020

martijnvg mentioned this pull request Dec 8, 2020

Protect replicated data streams against local rollovers #65999

Merged

martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Dec 8, 2020

Enable bwc tests and adjust versioning.

9a9d0f7

Relates to elastic#64710

martijnvg mentioned this pull request Dec 8, 2020

Enable bwc tests and adjust versioning. #66004

Merged

martijnvg added a commit that referenced this pull request Dec 8, 2020

Enable bwc tests and adjust versioning. (#66004)

c52f7f3

Relates to #64710

martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Dec 8, 2020

Enable bwc tests and adjust versioning. (elastic#66004)

27c4785

Relates to elastic#64710

martijnvg mentioned this pull request Dec 8, 2020

Enable bwc tests and adjust versioning #66016

Merged

martijnvg removed the backport pending label Dec 8, 2020

martijnvg added a commit that referenced this pull request Dec 8, 2020

Enable bwc tests and adjust versioning. (#66016)

279bf21

Backporting #66004 to 7.x branch. Relates to #64710

martijnvg mentioned this pull request Dec 10, 2020

Introduce aliases for data streams #66163

Open

7 tasks

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Protect replicated data streams against local rollovers #64710

Protect replicated data streams against local rollovers #64710

martijnvg commented Nov 6, 2020 •

edited

Loading

martijnvg Nov 6, 2020

dnhatn commented Nov 9, 2020 •

edited

Loading

martijnvg commented Nov 9, 2020

dnhatn Dec 7, 2020

martijnvg Dec 7, 2020

dnhatn Dec 7, 2020

martijnvg Dec 10, 2020

dnhatn Dec 10, 2020

dnhatn Dec 7, 2020

dnhatn Dec 7, 2020

dnhatn left a comment

Protect replicated data streams against local rollovers #64710

Protect replicated data streams against local rollovers #64710

Conversation

martijnvg commented Nov 6, 2020 • edited Loading

Choose a reason for hiding this comment

dnhatn commented Nov 9, 2020 • edited Loading

martijnvg commented Nov 9, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dnhatn left a comment

Choose a reason for hiding this comment

martijnvg commented Nov 6, 2020 •

edited

Loading

dnhatn commented Nov 9, 2020 •

edited

Loading