-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for on-sequence cold-starts #149
Comments
One scenario to handle the special case where the cold-start cycle graph is a subset of the other normal cycle graph would be to have a configuration option similar to the "exclude on start-up". That option selects tasks to be excluded from the suite execution tree until such time as they're added in along the run. An "exclude on cold-start" would have the same net effect in the initial cycle, but would not remove the tasks from the graph on future cycles. These tasks would not be removed from the graph on warm-starts either. Tasks run as part of the cold-start could have an environment variable passed to them as well so something like A suite running in this configuration, then, might look like
where the execution starts at the 0 hour cycle with a forecast only, then continues at the 6 hour cycle with data assimilation and a forecast. Currently, when warm-starting, the cold-start tasks are set to have a succeeded status. The opposite could be done here. When we're cold-starting, tasks listed in the The actual task script for
that would run only in cold-start mode. Still left unresolved:
|
From the mailing list discussion, Hilary pointed out that "dependence on the start-up task type only applies in a cold-start". Having start-up tasks only execute in a cold start would allow us to specify that a dependency between the
|
The multiple inheritance capability suggested in #134 would also help out here, since the individual tasks that would be skipped on a cold start could inherit from an additional namespace and that family name could be used in the |
Tim, I think your idea would work, and it may be simplest way to cold-start a suite in which "the cold-start graph is a subset of the normal cycle graph". My only concerns at this stage are:
Another approach I'd like to think about is more like what cylc currently does: cold-start tasks have to be specified as such and they only run once (but if they happen to do exactly the same thing as a normal-cycle counterpart, just give them the exact same runtime config) ... then when cylc starts up instantiate cold-start task proxies at the initial cycle time (as now) but cycling tasks in the next cycle (instead of at the initial cycle as now). This is not quite as simple as your approach for the "cold-start graph is a subset" case, but (I think?) it handles cold-start only (non-subset) tasks better? We could support both approaches if they are each simpler and easier to understand for different types of suites. One other thing to think about - what happens if you have several different cycling intervals in the same suite (e.g. some tasks that run every 6 hours and some every 12 hours). Currently cylc instantiates all tasks at the initial cycle time or at the next subsequent cycle time for the particular task. This would probably still work here, but we need to check.. |
We have some other potential requirements that may relate to this.
|
In our case, the basic suite is small enough that either way would work (i.e. specifying tasks to exclude or include in a cold start) - my initial suggestion was based off of something that was already included (i.e. you specify the jobs to exclude on start-up) but I'm not attached to that. One thing that may affect this is the addition of multiple inheritance (#134) - you could just specify a few parent namespaces to exclude on cold-start (or include on cold-start) and then all the relevant tasks could just include that namespace in an For the second point (maintaining the current cold-start task list), I've sketched it out and I don't think it would be a problem to keep that - in my initial example, the We actually do have different cycling intervals in the same suite - data assimilation/short forecasts run every 6 hours, and every 12 hours we kick off a long forecast. However, since the normal course of our experiments is to cold-start the system, let the data assimilation spin up, and then start doing long forecasts after a week or two, we currently use "exclude on start-up" to handle those for our experimental suites. Dave's suggestion of spin-down and shut-down tasks would be useful for us as well - after an experiment runs, we would like to run a detailed comparison against a reference case, and it would be great to be able to include that as part of the official suite. One way to handle Dave's need for multiple spin-up cycles would be to have the suite specify the first full cycle relative to the start time, but this seems like it would get very complicated very quickly. |
Tim & Dave - shutdown tasks will be easy to add to cylc. Multiple-cycle spin-up and spin-down tasks may not be too hard either. Note you can already "cylc insert" a task that continues cycling until a given stop cycle - so the internals are really there already, we just need a way to express this cleanly in the suite definition, and to think about when these temporary-duration tasks should be created. Dave, do you want to put up some new Issues for these? Tim - before we decide on which approach to go ahead with I don't think you've commented on my paragraph above starting "Another approach I'd like to think about is more like what cylc currently does:". It seems to me this might be the easiest one to implement because it could involve nothing more than bumping the first non-coldstart tasks into the next cycle when the task pool is loaded at start-up. What do you think? |
Hilary - I hadn't, since I wanted to read it over and make sure I understand what you're suggesting before I commented :) You're saying that instead of inserting everything everything into the task pool at the initial time and setting success/failure flags immediately if we're cold-starting, we hold off on the tasks that are marked as being skipped in a cold start - in that example above, that would mean that the That actually sounds like a really nice way to do it, and conceptually it matches what our top-level run script does now (i.e. just skips straight to the later tasks). It would look something like 'a task designated as being excluded on the cold start won't be eligible for execution until the next cycle after the cold start', which would handle the case of multiple cold-start tasks with multiple offsets, i.e. the following case
cold-started at the 00Z cycle would see the first I like it, but I don't know what sort of changes would be required to the dependency graph (if any) if it was implemented that way. |
Tim, I'm not sure this is as simple as you might hope. The current cold-start method fits nicely with the way cylc graphs are interpreted, namely a trigger "X => Y" defines what the task on the right triggers off at a given cycle time T. So, from "ColdFoo | Foo[T-6] => Foo" at the initial cycle T we end up with a trigger on a task ColdFoo with the same cycle T, not on a previous (non-existent) cycle. ... but "cold start as first cycle" methods do not seem to. E.g. The trouble with pushing the first instances of non-cold-start tasks into the next cycle is how to express it in the graph: graph = "ColdFoo[T-6] | Foo[T-6] => Foo" If this graph is still interpreted in the usual way (above) then we still have a bootstrapping problem because now Foo[T] always depends on a task in the previous cycle (so what to do at a given start cycle T?). Presumably we'd have to interpret the graph in a special way at start-up, by offsetting T appropriately to get (in effect) this: graph = "ColdFoo[T] | Foo[T] => Foo[T+6] # and create the first Foo at T_initial + 6 or use a special cold-start graph-string: cold-graph = "ColdFoo => Foo[T+6]" # currently we don't allow [T+/-n] on the right side of a trigger arrow But might this sort of extra complexity be worse than getting users to understand the current cold-start method? (the fact that Foo does not start cycling until T+6 is not actually apparent from the graph either). Your "exclude on cold-start" graphs also seem to require special interpretation at startup: "Model[T-6] => DA => Model" with "exclude DA at start-up": here the exclusion of DA at start-up is reasonably clear because there's a special task category to state that, but I guess cylc would also have to ignore Model[T-6] at start-up because T-6 is prior to the initial cycle? Finally this exclusion of tasks at start-up is not just a matter of omitting the excluded tasks from the task pool until the first full cycle - it also requires changing the prerequisites/triggers on the tasks that normally depend on the excluded ones (here: Model has to know not to wait on DA in the first, cold-start, cycle). (however, this is already done for start-up tasks). (I hope this makes some sense .... it's getting late here!) |
This problem has ways of being solved in the new cycling syntax of #119. |
cylc-dev discussion: https://groups.google.com/forum/?hl=en&fromgroups=#!topic/cylc-dev/g7dhs14sQUU
For suites with a cold-start forecast that happens to start exactly one normal cycle interval prior to the first full/normal/warm cycle, users are likely to think that the suite's "initial cycle time" should be that of the cold-start, not the first proper cycle. This does not work in general (cold-starts often are not on-sequence) but we could support it as an option for suites where it does make sense.
The text was updated successfully, but these errors were encountered: