-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protect replicated data streams against local rollovers #64710
Protect replicated data streams against local rollovers #64710
Conversation
…that follows a remote data stream fails.
throw new ResourceNotFoundException("data stream [" + request.getName() + "] does not exist"); | ||
} | ||
|
||
DataStream promotedDataStream = dataStream.promoteDataStream(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we embed this in the unfollow api? I'm not sure, because that api is centered around unfollowing a regular index and this operation is different and that would add ambiguity to the unfollow api. However the argument can also be made in reverse, in that this is a related operation.
Thanks Martijn. I wonder how we use bi-directional replication with data streams. Today CCR does not require manual intervention for bi-directional replication user cases. |
As far as I know, that doesn't require manual intervention as well. Auto following a data stream is similar to following an index pattern. Auto following a data stream also updates the data stream in the local cluster, when auto following an index pattern just IndexMetadata is added/updated. |
} | ||
} | ||
|
||
public void testDataStreamsBiDirectionalReplication() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martijnvg Sorry, I think I didn't explain well my concern about supporting bi-directional replication. What I meant was JasonZ's blog. In his setup, we can send the same indexing request (uses the same write alias) to either cluster. In this test, we use different data streams for indexing requests. That means users can't simply reroute all indexing to a single cluster when another is not available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently an alias can't point to a data stream or its backing indices. So the test can't completely follow the bi directional scenario. I did add a few logs-http*
searches to mimic reading from a logs-http
alias, as best effort replacement for the fact aliases can't point to data streams.
We're planning to add alias support to data streams. These aliases would only be able point to data streams and not to a data stream's backing indices, other indices or other aliases. Like aliases defined on indices, aliases on data stream could also have a write flag, which indicates to which data stream write requests are redirected to. I will add a TODO here, that these searches on logs-http*
pattern should be replaced with searches and writes via aliases when alias support for data streams has landed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're planning to add alias support to data streams.
Thanks for explaning + adding the TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @martijnvg.
"indices.promote_data_stream":{ | ||
"documentation":{ | ||
"url":"https://www.elastic.co/guide/en/elasticsearch/reference/master/data-streams.html", | ||
"description":"Promotes a data stream from a replicate data stream managed by CCR to a regular data stream" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/replicate/replicated
} | ||
} | ||
|
||
public void testDataStreamsBiDirectionalReplication() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're planning to add alias support to data streams.
Thanks for explaning + adding the TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks so much for all iterations @martijnvg!
Backporting elastic#64710 to the 7.x branch. When a data stream is being auto followed then a rollover in a local cluster can break auto following, if the local cluster performs a rollover then it creates a new write index and if then later the remote cluster rolls over as well then that new write index can't be replicated, because it has the same name as in the write index in the local cluster, which was created earlier. If a data stream is managed by ccr, then the local cluster should not do a rollover for those data streams. The data stream should be rolled over in the remote cluster and that change should replicate to the local cluster. Performing a rollover in the local cluster is an operation that the data stream support in ccr should perform. To protect against rolling over a replicated data stream, this PR adds a replicate field to DataStream class. The rollover api will fail with an error in case a data stream is being rolled over and the targeted data stream is a replicated data stream. When the put follow api creates a data stream in the local cluster then the replicate flag is set to true. There should be a way to turn a replicated data stream into a regular data stream when for example during disaster recovery. The newly added api in this pr (promote data stream api) is doing that. After a replicated data stream is promoted to a regular data stream then the local data stream can be rolled over, so that the new write index is no longer a follower index. Also if the put follow api is attempting to update this data stream (for example to attempt to resume auto following) then that with fail, because the data stream is no longer a replicated data stream. Today with time based indices behind an alias, the is_write_index property isn't replicated from remote cluster to the local cluster, so when attempting to rollover the alias in the local cluster the rollover fails, because the alias doesn't have a write index. The added replicated field in the DataStream class and added validation achieve the same kind of protection, but in a more robust way. A followup from elastic#61993.
Backporting #64710 to the 7.x branch. When a data stream is being auto followed then a rollover in a local cluster can break auto following, if the local cluster performs a rollover then it creates a new write index and if then later the remote cluster rolls over as well then that new write index can't be replicated, because it has the same name as in the write index in the local cluster, which was created earlier. If a data stream is managed by ccr, then the local cluster should not do a rollover for those data streams. The data stream should be rolled over in the remote cluster and that change should replicate to the local cluster. Performing a rollover in the local cluster is an operation that the data stream support in ccr should perform. To protect against rolling over a replicated data stream, this PR adds a replicate field to DataStream class. The rollover api will fail with an error in case a data stream is being rolled over and the targeted data stream is a replicated data stream. When the put follow api creates a data stream in the local cluster then the replicate flag is set to true. There should be a way to turn a replicated data stream into a regular data stream when for example during disaster recovery. The newly added api in this pr (promote data stream api) is doing that. After a replicated data stream is promoted to a regular data stream then the local data stream can be rolled over, so that the new write index is no longer a follower index. Also if the put follow api is attempting to update this data stream (for example to attempt to resume auto following) then that with fail, because the data stream is no longer a replicated data stream. Today with time based indices behind an alias, the is_write_index property isn't replicated from remote cluster to the local cluster, so when attempting to rollover the alias in the local cluster the rollover fails, because the alias doesn't have a write index. The added replicated field in the DataStream class and added validation achieve the same kind of protection, but in a more robust way. A followup from #61993
When a data stream is being auto followed then a rollover in a local cluster can break auto following,
if the local cluster performs a rollover then it creates a new write index and if then later the remote
cluster rolls over as well then that new write index can't be replicated, because it has the same name
as in the write index in the local cluster, which was created earlier.
If a data stream is managed by ccr, then the local cluster should not do a rollover for those data streams.
The data stream should be rolled over in the remote cluster and that change should replicate to the local
cluster. Performing a rollover in the local cluster is an operation that the data stream support in ccr should
perform.
To protect against rolling over a replicated data stream, this PR adds a
replicate
field toDataStream
class.The rollover api will fail with an error in case a data stream is being rolled over and the targeted data stream is
a replicated data stream. When the put follow api creates a data stream in the local cluster then the replicate flag
is set to true. There should be a way to turn a replicated data stream into a regular data stream when for example
during disaster recovery. The newly added api in this pr (promote data stream api) is doing that. After a replicated
data stream is promoted to a regular data stream then the local data stream can be rolled over, so that the new
write index is no longer a follower index. Also if the put follow api is attempting to update this data stream
(for example to attempt to resume auto following) then that with fail, because the data stream is no longer a
replicated data stream.
Today with time based indices behind an alias, the
is_write_index
property isn't replicated from remote clusterto the local cluster, so when attempting to rollover the alias in the local cluster the rollover fails, because the
alias doesn't have a write index. The added
replicated
field in theDataStream
class and added validationachieve the same kind of protection, but in a more robust way.
A followup from #61993.
(labelling this PR as
non-issue
, since data stream support for CCR hasn't been released yet)