Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: play|pause|stop #3897

Closed
oliver-sanders opened this issue Oct 27, 2020 · 35 comments · Fixed by #4040 or #4076
Closed

cli: play|pause|stop #3897

oliver-sanders opened this issue Oct 27, 2020 · 35 comments · Fixed by #4040 or #4076
Assignees
Milestone

Comments

@oliver-sanders
Copy link
Member

oliver-sanders commented Oct 27, 2020

Rename the existing workflow lifecycle commands to match the rose suite-run migration proposal and implement the other CLI changes.

I.E. do what it says in the proposal under the following sections:

@oliver-sanders oliver-sanders added this to the cylc-8.0b0 milestone Oct 27, 2020
This was referenced Oct 27, 2020
@hjoliver
Copy link
Member

hjoliver commented Oct 31, 2020

Not sure why this didn't occur to us before, but cylc resume makes more sense than release, as a counterpart to cylc pause.

  • cylc hold and cylc release for tasks
  • cylc pause and cylc resume for the flow

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Jan 6, 2021

Update: the proposal is now a little out of date here is the current proposal for cylc play:

  • Rename cylc run => cylc play
  • Remove the START_POINT optional positional argument.
  • Start the flow in "restart" mode if the database is present & delete the cylc restart command.
  • Unpause the flow if already running and exit(0).

There are two loose ends to tie up:

  1. The proposal suggests erroring if a flow is started that has already run to completion.

    It is tricky to determine whether a flow has run to completion or not. The simplest solution is just to restart the flow anyway, if it has already run to completion it will just shut down.

  2. --ignore-*-cycle-point

    cylc run has these four options:

    --initial-cycle-point
    --final-cycle-point
    --start-cycle-point
    --stop-cycle-point
    

    And cylc restart has these corresponding options:

    --ignore-initial-cycle-point
    --ignore-final-cycle-point
    --ignore-start-cycle-point
    --ignore-stop-cycle-point
    

    Suggest removing the --ignore* opts and instead using --*-cycle-point=ignore (this is safe as it will fail for a cold-start as it is an invalid cycle point). Setting the --final-cycle-point on a restart should implicitly ignore the old one.

    Would also suggest rationalising the many aliases for the initial and final cycle point:

    --initial-cycle-point, --icp
    --final-cycle-point, --fcp
    --start-cycle-point, --stopcp
    --stop-cycle-point, --startcp
    

@MetRonnie
Copy link
Member

Do we want to keep an alias run -> play or restart -> play?

@MetRonnie MetRonnie self-assigned this Jan 14, 2021
@oliver-sanders
Copy link
Member Author

Nope, but we should add a "dead end" (see cylc.flow.scripts.cylc)

@MetRonnie
Copy link
Member

MetRonnie commented Jan 14, 2021

What happens to warm/cold starts? Am I right in thinking the concept of warm start disappears?

Also, should the alias start -> run be updated to start -> play or removed?

@oliver-sanders
Copy link
Member Author

The desire is to get rid of warm starts, however, I'm not sure that we can do that right now.

I think we should get rid of the aliases.

@MetRonnie
Copy link
Member

Just to note down some further info Oliver sent me:

The hold/release commands do two things, they can hold/release a flow OR tasks within a flow. These seem similar but are actually very different concepts. hold/release a flow is setting the state of the task pool. hold/release a task is adding/removing xtriggers to tasks (currently implemented a little differently).

The idea is to pull the flow and task features apart.

  • play/pause (flow state)
  • hold/release (task state)

@MetRonnie
Copy link
Member

MetRonnie commented Jan 15, 2021

What's the difference between

  • cylc run --warm SUITE [START_CYCLE_POINT] and
  • cylc run SUITE --startcp=START_CYCLE_POINT?

From looking at the code & docs, they seem to be identical, in which case the --warm option can be dropped?

@hjoliver
Copy link
Member

Yep. In fact the command help says --startcp "implies --warm". The term "warm start" comes from the modeling terminology for warm starting a cycling model, of course. But in terms of Cylc and the graph, "start cycle point" probably is a better term - more generic, and descriptive of what Cylc is actually doing. (Note also spawn-on-demand will make warm start obsolete, except perhaps as a convenience where multiple tasks have to be triggered at once to start the flow).

@oliver-sanders
Copy link
Member Author

It's a much nicer interface, should we drop the --warm option and move the warm documentation onto the --startcp argument?

@MetRonnie
Copy link
Member

Ah ok, so the concept of warm start stays but the cli option goes

@MetRonnie MetRonnie mentioned this issue Jan 20, 2021
7 tasks
@MetRonnie
Copy link
Member

Would also suggest rationalising the many aliases for the initial and final cycle point

Does that mean I should get rid of:

  • --ict (icp)
  • --initial-point (icp)
  • --until (fcp)
  • --final-point (fcp)
  • --start-point (startcp)
  • --stop-point (stopcp)

@MetRonnie
Copy link
Member

(Although from the changelog, --until was only just added in 8.0a1!)

@oliver-sanders
Copy link
Member Author

--until goes back a while, the changelog says:

--until=POINT option is now an alias for --final-cycle-point=POINT option.

@MetRonnie
Copy link
Member

MetRonnie commented Jan 25, 2021

Should it be possible to hold tasks after a certain point, like it is possible to hold/pause a workflow after a certain point?

Also, what are some good concrete example(s) of cylc hold to put in the docstring, rather than cylc hold REG TASK_GLOBS ... which is identical to the ARGS?

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Jan 25, 2021

Should it be possible to hold tasks after a certain point

This is the --after option?

Yep we want to keep that although it should be an argument of cylc pause rather than cylc hold because it is fiddling with task_pool state rather than that of individual tasks.

I've just noticed that the documentation for cylc hold and the hold mutation do not match. It is definitely a cycle point not a wallclock time (implemented in cylc.flow.scheduler.Scheduler.hold_suite).

Also, what are some good concrete example(s) of cylc hold to put in the docstring

How about this:

# hold mytask at cycle 1234 in myflow
$ cylc hold myflow mytask.1234

# hold all tasks in cycle 1234 in myflow
$ cylc hold myflow '*.1234'

# hold all active instances of mytask in myflow
$ cylc hold myflow 'mytask.*'

@MetRonnie
Copy link
Member

MetRonnie commented Jan 25, 2021

If a workflow has some held tasks (but not all, i.e. it is not paused), should running cylc play release the held tasks or do nothing?

@oliver-sanders
Copy link
Member Author

Do nothing, cylc play shouldn't go messing with task states.

@MetRonnie
Copy link
Member

MetRonnie commented Jan 26, 2021

For cylc play, what should happen to --hold and --hold-point/--hold-after? Should they be replaced by --pause and --pause-point/--pause-after?

Also, in line with the other aliases, should --hold-point/--pause-point change to --hold-cycle-point/--pause-cycle-point and --holdcp/--pausecp?

@MetRonnie
Copy link
Member

If a workflow has some held tasks (but not all, i.e. it is not paused), should running cylc play release the held tasks or do nothing?

Also, if a workflow is paused, what should cylc release task.n do? Should it go ahead and release the task(s)? If so, wouldn't that imply the task pool is not held anymore, therefore the workflow is no longer paused?

@oliver-sanders
Copy link
Member Author

if a workflow is paused, what should cylc release task.n do? Should it go ahead and release the task(s)?

It should release the task which shouldn't run because the scheduler is "held".

So the behaviour for the logical combinations of play/pause and hold/release would be:

$ cylc install x

# start a Scheduler pre-paused
# (a.1 won't run because Scheduler paused)
$ cylc play --pause x

# hold task a.1
# (a.1 won't run because Scheduler paused and task held)
$ cylc hold x a.1

# un-pause the Scheduler
# (a.1 won't run because task held)
$ cylc play x

# release task a.1
# (a.1 will run)
$ cylc release x a.1

However, as you've noticed, that's not how things work at the moment. For now it's a good start just to separate the interfaces, we can fix the internals later, however, if you fancy taking a deeper dive it should be fairly straight forward to change the behaviour of the task_pool hold mechanism.

The tasks are actually submitted from somewhere in the Scheduler, that's the bit you will need to skip if Scheduler.pool.held is True. You may need to do something to get the workflow state set correctly, not sure how that gets done nowadays.

@MetRonnie
Copy link
Member

MetRonnie commented Jan 26, 2021

Another thing: I suppose flow.cylc[scheduling]hold after cycle point will have to be deprecated in favour of pause after cycle point

@hjoliver
Copy link
Member

However, as you've noticed, that's not how things work at the moment.

Yeah, it's all done by holding tasks at the moment, needs to be fixed.

@MetRonnie
Copy link
Member

MetRonnie commented Feb 2, 2021

Question about final cycle point: say you have the workflow

[scheduling]
    cycling mode = integer
    final cycle point = 5
    [[graph]]
        P1 = foo

Then you run:

  1.  $ cylc run temp3 -n --fcp=2
     INFO - Cold Start 1
     ...
     INFO - Initial point: 1
     INFO - Final point: 2
     ...
     INFO - [foo.1] -triggered off []
     INFO - [foo.2] -triggered off []
     ...
     INFO - Suite shutting down - AUTOMATIC
  2.  $ cylc restart temp3 -n --ignore-final-cycle-point
     INFO - LOADING suite parameters
     ...
     INFO - Initial point: 1
     INFO - Final point: 5
     INFO - Suite shutting down - AUTOMATIC
     INFO - DONE

In (2), nothing runs. Is this meant to happen?

Tested on master, d6a09ba


P.S., in (2), in the snipped bit (...), I get this:

INFO - LOADING task action timers
INFO - + foo.1 poll_timer
WARNING - foo.1: task not found, skip
INFO - + foo.1 ['try_timers', 'submission-retry']
WARNING - foo.1: task not found, skip
INFO - + foo.1 ['try_timers', 'execution-retry']
WARNING - foo.1: task not found, skip
INFO - + foo.2 poll_timer
WARNING - foo.2: task not found, skip
INFO - + foo.2 ['try_timers', 'submission-retry']
WARNING - foo.2: task not found, skip
INFO - + foo.2 ['try_timers', 'execution-retry']
WARNING - foo.2: task not found, skip

Why so many warnings?

@oliver-sanders
Copy link
Member Author

In Cylc7 the workflow would pick up where it left off and continue to the configured FCP (i.e. 5). I think this is an SoD quirk perhaps caused by restarting with an empty task pool?

Think that should go up as a bug.

@MetRonnie
Copy link
Member

MetRonnie commented Feb 2, 2021

Ok, as discussed in our meeting, if that's a bug it can be fixed independently of #4040

@hjoliver
Copy link
Member

hjoliver commented Feb 2, 2021

In Cylc7 the workflow would pick up where it left off and continue to the configured FCP (i.e. 5). I think this is an SoD quirk perhaps caused by restarting with an empty task pool?

Think that should go up as a bug.

I'm not so sure that's a bug! The workflow ran to completion at FCP=2. A restart starts from the previous recorded state, which was already "completed", so nothing will happen.

It makes some sense to restart with an extended FCP if you haven't already run to completion, but not if you have.

Also, what if you'd already run special tasks in the last cycle(s)? The only sensible way to do this kind of a "restart" would be to do warm start part-way through the new (extended) graph.

@oliver-sanders
Copy link
Member Author

Ah sorry, miss read, I thought the flow had been stopped before the fcp. After the cylc play work is finished we would expect the "restart" to be rejected since the flow has already run to completion so I don't think this circumstance should be possible going forward?

@MetRonnie
Copy link
Member

MetRonnie commented Feb 3, 2021

I see, the real test of whether --fcp=ignore works is:

[scheduling]
    runahead limit = P1
    cycling mode = integer
    final cycle point = 5
    stop after cycle point = 2
    [[graph]]
        P1 = foo

Then

  1. $ cylc play temp3 -n --fcp=3 runs to cycle point 2 and shuts down
  2. $ cylc play temp3 -n --fcp=ignore restarts at cycle point 3 and runs to 5

@hjoliver
Copy link
Member

hjoliver commented Feb 4, 2021

  1. $ cylc play temp3 -n --fcp=ignore restarts at cycle point 3 and runs to 5

It wouldn't stop at point 5, because you told it to ignore the fcp.

@MetRonnie
Copy link
Member

But the ignore means ignore the fcp stored in the DB only, reverting to whatever is in flow.cylc?

From cylc restart --help:

  --ignore-final-cycle-point
                        Ignore the final cycle point in the suite run
                        database. If one is specified in the suite definition
                        it will be used, however.

@MetRonnie
Copy link
Member

if a workflow is paused, what should cylc release task.n do? Should it go ahead and release the task(s)?

It should release the task which shouldn't run because the scheduler is "held".

(full comment here)

Just to seek some further clarity on that - the idea of "cylc pause pauses the workflow by holding all tasks" should become "it pauses the workflow"?

@oliver-sanders
Copy link
Member Author

Not sure where the "pauses the workflow by holding all tasks" quote comes from?

We would like cylc pause to pause the workflow but not to pause any of the tasks within it.

@MetRonnie
Copy link
Member

Not sure where the "pauses the workflow by holding all tasks" quote comes from?

That was my initial understanding of "pause" based on the pre-existing behaviour

@MetRonnie
Copy link
Member

MetRonnie commented Feb 8, 2021

Actually, if pause does not hold tasks, how can we pause after a particular cycle point? (E.g. cylc play --pause-after=POINT, cylc pause --after=POINT, flow.cylc[scheduling]pause after cycle point) Depending on the runahead limit, there would be nothing to stop tasks running beyond the pause point if any one task still hasn't reached it

@MetRonnie MetRonnie mentioned this issue Feb 15, 2021
7 tasks
@MetRonnie MetRonnie linked a pull request Feb 17, 2021 that will close this issue
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants