Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Fixes for stop datafeed edge cases #49191

Merged
merged 2 commits into from
Nov 19, 2019

Conversation

droberts195
Copy link
Contributor

The following edge cases were fixed:

  1. A request to force-stop a stopping datafeed is no longer
    ignored. Force-stop is an important recovery mechanism
    if normal stop doesn't work for some reason, and needs
    to operate on a datafeed in any state other than stopped.
  2. If the node that a datafeed is running on is removed from
    the cluster during a normal stop then the stop request is
    retried (and will likely succeed on this retry by simply
    cancelling the persistent task for the affected datafeed).
  3. If there are multiple simultaneous force-stop requests for
    the same datafeed we no longer fail the one that is
    processed second. The previous behaviour was wrong as
    stopping a stopped datafeed is not an error, so stopping
    a datafeed twice simultaneously should not be either.

Fixes #43670
Fixes #48931

The following edge cases were fixed:

1. A request to force-stop a stopping datafeed is no longer
   ignored.  Force-stop is an important recovery mechanism
   if normal stop doesn't work for some reason, and needs
   to operate on a datafeed in any state other than stopped.
2. If the node that a datafeed is running on is removed from
   the cluster during a normal stop then the stop request is
   retried (and will likely succeed on this retry by simply
   cancelling the persistent task for the affected datafeed).
3. If there are multiple simultaneous force-stop requests for
   the same datafeed we no longer fail the one that is
   processed second.  The previous behaviour was wrong as
   stopping a stopped datafeed is not an error, so stopping
   a datafeed twice simultaneously should not be either.

Fixes elastic#43670
Fixes elastic#48931
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

} else {
listener.onFailure(e);
}
});

super.doExecute(task, request, finalListener);
}

private void forceStopDatafeed(final StopDatafeedAction.Request request, final ActionListener<StopDatafeedAction.Response> listener,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method seems like it no longer needs to distinguish started and stopping datafeeds. Should we pass it a single list of datafeeds and do the concatenation a level up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's done in the second commit.

While I was doing this I noticed that although we have a starting state we never use it. So I also added a TODO about using that for 8.0. (Looking back through the history, starting was added in 5.5, but couldn't be used until 6.x because 5.4 wouldn't understand it. Now we're beyond that we can use it, but we should only do this in a major version just in case somebody is relying on stopped, started and stopping being the only 3 states that exist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need to use it. I recall adding it so we had it available without BWC in case we needed it. But surely, if it improves things we can definitely use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not using it creates a potential race condition where if you start and stop a datafeed in very quick succession then the stop will be ignored. It seems that none of our current tests do this.

Also added a TODO, as while doing this I noticed that the
"starting" state is never used.
Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@droberts195 droberts195 merged commit 8bbbe28 into elastic:master Nov 19, 2019
@droberts195 droberts195 deleted the fix_stop_datafeed_edge_cases branch November 19, 2019 09:26
droberts195 added a commit to droberts195/elasticsearch that referenced this pull request Nov 19, 2019
The following edge cases were fixed:

1. A request to force-stop a stopping datafeed is no longer
   ignored.  Force-stop is an important recovery mechanism
   if normal stop doesn't work for some reason, and needs
   to operate on a datafeed in any state other than stopped.
2. If the node that a datafeed is running on is removed from
   the cluster during a normal stop then the stop request is
   retried (and will likely succeed on this retry by simply
   cancelling the persistent task for the affected datafeed).
3. If there are multiple simultaneous force-stop requests for
   the same datafeed we no longer fail the one that is
   processed second.  The previous behaviour was wrong as
   stopping a stopped datafeed is not an error, so stopping
   a datafeed twice simultaneously should not be either.

Backport of elastic#49191
droberts195 added a commit that referenced this pull request Nov 19, 2019
The following edge cases were fixed:

1. A request to force-stop a stopping datafeed is no longer
   ignored.  Force-stop is an important recovery mechanism
   if normal stop doesn't work for some reason, and needs
   to operate on a datafeed in any state other than stopped.
2. If the node that a datafeed is running on is removed from
   the cluster during a normal stop then the stop request is
   retried (and will likely succeed on this retry by simply
   cancelling the persistent task for the affected datafeed).
3. If there are multiple simultaneous force-stop requests for
   the same datafeed we no longer fail the one that is
   processed second.  The previous behaviour was wrong as
   stopping a stopped datafeed is not an error, so stopping
   a datafeed twice simultaneously should not be either.

Backport of #49191
@jimczi jimczi added v7.5.0 and removed v7.5.1 labels Nov 19, 2019
@droberts195 droberts195 added v7.5.1 and removed v7.5.0 labels Nov 20, 2019
droberts195 added a commit to droberts195/elasticsearch that referenced this pull request Nov 20, 2019
If a datafeed is stopped normally and force stopped at the same
time then it is possible that the force stop removes the
persistent task while the normal stop is performing actions.
Currently this causes the normal stop to error, but since
stopping a stopped datafeed is not an error this doesn't make
sense. Instead the force stop should just take precedence.

This is a followup to elastic#49191 and should really have been
included in the changes in that PR.
droberts195 added a commit that referenced this pull request Nov 20, 2019
If a datafeed is stopped normally and force stopped at the same
time then it is possible that the force stop removes the
persistent task while the normal stop is performing actions.
Currently this causes the normal stop to error, but since
stopping a stopped datafeed is not an error this doesn't make
sense. Instead the force stop should just take precedence.

This is a followup to #49191 and should really have been
included in the changes in that PR.
droberts195 added a commit that referenced this pull request Nov 20, 2019
If a datafeed is stopped normally and force stopped at the same
time then it is possible that the force stop removes the
persistent task while the normal stop is performing actions.
Currently this causes the normal stop to error, but since
stopping a stopped datafeed is not an error this doesn't make
sense. Instead the force stop should just take precedence.

This is a followup to #49191 and should really have been
included in the changes in that PR.
droberts195 added a commit that referenced this pull request Nov 20, 2019
The following edge cases were fixed:

1. A request to force-stop a stopping datafeed is no longer
   ignored.  Force-stop is an important recovery mechanism
   if normal stop doesn't work for some reason, and needs
   to operate on a datafeed in any state other than stopped.
2. If the node that a datafeed is running on is removed from
   the cluster during a normal stop then the stop request is
   retried (and will likely succeed on this retry by simply
   cancelling the persistent task for the affected datafeed).
3. If there are multiple simultaneous force-stop requests for
   the same datafeed we no longer fail the one that is
   processed second.  The previous behaviour was wrong as
   stopping a stopped datafeed is not an error, so stopping
   a datafeed twice simultaneously should not be either.

Backport of #49191
droberts195 added a commit that referenced this pull request Nov 20, 2019
The following edge cases were fixed:

1. A request to force-stop a stopping datafeed is no longer
   ignored.  Force-stop is an important recovery mechanism
   if normal stop doesn't work for some reason, and needs
   to operate on a datafeed in any state other than stopped.
2. If the node that a datafeed is running on is removed from
   the cluster during a normal stop then the stop request is
   retried (and will likely succeed on this retry by simply
   cancelling the persistent task for the affected datafeed).
3. If there are multiple simultaneous force-stop requests for
   the same datafeed we no longer fail the one that is
   processed second.  The previous behaviour was wrong as
   stopping a stopped datafeed is not an error, so stopping
   a datafeed twice simultaneously should not be either.

Backport of #49191
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants