Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kill job on task state reset from submitted or running #2621

Closed
matthewrmshin opened this issue Apr 12, 2018 · 9 comments
Closed

Kill job on task state reset from submitted or running #2621

matthewrmshin opened this issue Apr 12, 2018 · 9 comments
Labels
bug Something is wrong :(
Milestone

Comments

@matthewrmshin
Copy link
Contributor

matthewrmshin commented Apr 12, 2018

From #2528.

When users reset the state of a submitted or running task to ready without killing the original job first can lead to existence of multiple jobs of the same task. This should also be handled correctly. We should have the suite make an automatic attempt to kill the original job when user resets the state of a submitted or running task.

Somewhat related (although not a submit number issue): careless (but common) use of suicide triggers can result in removing an active task proxy. Currently we just log a warning about this; we should probably kill the active job as well.

See:
#2528 (comment)
#2528 (comment)
#2528 (comment)

See also: #2199 #2394 #2506 #2618

@matthewrmshin matthewrmshin added the bug Something is wrong :( label Apr 12, 2018
@matthewrmshin matthewrmshin added this to the soon milestone Apr 12, 2018
@matthewrmshin matthewrmshin self-assigned this Apr 12, 2018
@hjoliver
Copy link
Member

Note that #2600 disallows manual reset to "ready" - although I suppose simply retriggering a running task, or resetting it wo "waiting" will have exactly the same effect!

@hjoliver
Copy link
Member

(It's arguable that this is a bug IMO, although I agree that attempting to kill the original job is preferable anyway).

@matthewrmshin
Copy link
Contributor Author

To further improve the issue reported in #2528, the logic for job 2 submission should check that job 1 is no longer running. It can then decide to either:

  • fail job 2 submission, or
  • kill job 1, then submit job 2.

@dpmatthews
Copy link
Contributor

To be safe, I think that if you try to trigger or reset the state of a submitted or running task then, by default, this should fail. We would then need a force mode to override this.

@dwsutherland
Copy link
Member

dwsutherland commented Nov 16, 2018

Perhaps a warning prompt/message could be issued/logged (just the GUI? interactive CLI?) on task reset/trigger before kill of found running/submitted job(s).

This could be achieved via an optional request argument (Default; 'cancel_job=True' (kills existing running/submitted job)) when set to False will include a warning message in the response and not kill...

What to do on failure to kill job 1?
If job 1 is stuck and you continue with job 2, then messages from job 1 could still be received by the suite when unstuck/manual-kill (unless this behavior has been changed since 7.5.0)

@matthewrmshin
Copy link
Contributor Author

In 7.7+ messages from old jobs are ignored.

@matthewrmshin
Copy link
Contributor Author

The only problem left is that job 1 may continue to occupy the same computing resource that job 2 will require - causing job 2 to fail eventually.

@dwsutherland
Copy link
Member

The only problem left is that job 1 may continue to occupy the same computing resource that job 2 will require - causing job 2 to fail eventually.

True, but not really a Cylc issue.. And given the reset/re-trigger is done manually, the user will have to be confident in their batch system to handle resource contention. I guess the only responsibility of Cylc's is to notify the user of the already running/submitted job (hence the warning prompt)..

@matthewrmshin matthewrmshin removed their assignment Aug 28, 2019
@matthewrmshin matthewrmshin modified the milestones: soon, cylc-8.0.0 Aug 28, 2019
@oliver-sanders
Copy link
Member

Also closed by #3515

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :(
Projects
None yet
Development

No branches or pull requests

5 participants