-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making reload safer #5107
Comments
We would need to flush through all tasks in the preparing state (otherwise we can't say for sure whether the job file was written with the old or new settings) whilst the workflow is paused (otherwise tasks will continue to enter the preparing state). One downside of this is that a remote operation (e.g. remote-init) running during the reload process would cause the workflow to become unresponsive for that period. Without major changes there would be no way to "cancel" a reload once actioned. |
It should also not repeat any remote file installation. |
With the current approach to reload it's not especially clear what is and isn't supposed to change as a result of a reload. I went through the code and pulled out the attributes which are and aren't reloaded:Task Proxy AttributesReloaded (not explicitly preserved):
Preserved:
Task State AttributesReloaded (not explicitly preserved):
Preserved:
Main takeaway is that prereqs and outputs can't be changed by a reload. Since SoD this should probably be possible now. It makes sense to add or remove a prerequisite of a waiting task or add an output to a failed one? |
Well done, good to have that info laid out in full 👍
Worth pointing out, for anyone reading this, we're only talking about reloading existing task proxies here - everything can be changed by reload if the task hasn't been spawned yet.
Yeah, we should be able to do that. |
What should be preserved:
What should be reloaded:
Any exceptions to that? |
Just realized I didn't comment on @dpmatthews suggestions at the top. I like the reload suggestions 👍 And I particularly like the restart idea: always restart with the pre-restart config, so the user always has to reload an updated flow.cylc. (However, this will make it even more important to get reload right, as restart won't be a fall back any more.) |
For the record here's a portion of the diff I was working on for itask in tasks:
if itask.tdef.name in self.orphans:
if (
- itask.state(TASK_STATUS_WAITING)
- or itask.state.is_held
- or itask.state.is_queued
+ itask.state(TASK_STATUS_WAITING)
+ or itask.state.is_held
+ or itask.state.is_queued
):
# Remove orphaned task if it hasn't started running yet.
self.remove(itask, 'task definition removed')
@@ -881,21 +881,31 @@ class TaskPool:
"- task definition removed"
)
else:
- new_task = TaskProxy(
- self.config.get_taskdef(itask.tdef.name),
- itask.point, itask.flow_nums, itask.state.status)
- itask.copy_to_reload_successor(new_task)
- self._swap_out(new_task)
- LOG.info(f"[{itask}] reloaded task definition")
- if itask.state(*TASK_STATUSES_ACTIVE):
+ if (
+ itask.state(TASK_STATUS_PREPARING, *TASK_STATUSES_ACTIVE)
+ or itask.waiting_on_job_prep # pre preparation
+ ):
LOG.warning(
f"[{itask}] active with pre-reload settings"
)
- elif itask.state(TASK_STATUS_PREPARING):
- # Job file might have been written at this point?
- LOG.warning(
- f"[{itask}] may be active with pre-reload settings"
- )
+ else:
+ itask.reload(self.config.get_taskdef(itask.tdef.name))
+ LOG.info(f"[{itask}] reloaded task definition")
+ # new_task = TaskProxy(
+ # self.config.get_taskdef(itask.tdef.name),
+ # itask.point, itask.flow_nums, itask.state.status)
+ # itask.copy_to_reload_successor(new_task)
+ # self._swap_out(new_task)
+ # LOG.info(f"[{itask}] reloaded task definition")
+ # if itask.state(*TASK_STATUSES_ACTIVE):
+ # LOG.warning(
+ # f"[{itask}] active with pre-reload settings"
+ # )
+ # elif itask.state(TASK_STATUS_PREPARING):
+ # # Job file might have been written at this point?
+ # LOG.warning(
+ # f"[{itask}] may be active with pre-reload settings"
+ # )
diff --git a/cylc/flow/task_proxy.py b/cylc/flow/task_proxy.py
index b5511ffb1..7216993a4 100644
--- a/cylc/flow/task_proxy.py
+++ b/cylc/flow/task_proxy.py
@@ -266,6 +266,20 @@ class TaskProxy:
f" flows:{','.join(str(i) for i in self.flow_nums) or 'none'}"
)
+ def reload(self, tdef):
+ # swap out the task definition
+ self.tdef = tdef
+
+ # reload the task state (incl. triggers & outputs)
+ # self.state.reload(tdef, self.point)
+
+ # recompute derived values
+ self.get_late_time()
+ self.graph_children = generate_graph_children(tdef, self.point)
+
+ self.state.is_queued = False
+ self.state.is_updated = True
for itask in tasks:
if itask.tdef.name in self.orphans:
if (
- itask.state(TASK_STATUS_WAITING)
- or itask.state.is_held
- or itask.state.is_queued
+ itask.state(TASK_STATUS_WAITING)
+ or itask.state.is_held
+ or itask.state.is_queued
):
# Remove orphaned task if it hasn't started running yet.
self.remove(itask, 'task definition removed')
@@ -881,21 +881,31 @@ class TaskPool:
"- task definition removed"
)
else:
- new_task = TaskProxy(
- self.config.get_taskdef(itask.tdef.name),
- itask.point, itask.flow_nums, itask.state.status)
- itask.copy_to_reload_successor(new_task)
- self._swap_out(new_task)
- LOG.info(f"[{itask}] reloaded task definition")
- if itask.state(*TASK_STATUSES_ACTIVE):
+ if (
+ itask.state(TASK_STATUS_PREPARING, *TASK_STATUSES_ACTIVE)
+ or itask.waiting_on_job_prep # pre preparation
+ ):
LOG.warning(
f"[{itask}] active with pre-reload settings"
)
- elif itask.state(TASK_STATUS_PREPARING):
- # Job file might have been written at this point?
- LOG.warning(
- f"[{itask}] may be active with pre-reload settings"
- )
+ else:
+ itask.reload(self.config.get_taskdef(itask.tdef.name))
+ LOG.info(f"[{itask}] reloaded task definition")
+ # new_task = TaskProxy(
+ # self.config.get_taskdef(itask.tdef.name),
+ # itask.point, itask.flow_nums, itask.state.status)
+ # itask.copy_to_reload_successor(new_task)
+ # self._swap_out(new_task)
+ # LOG.info(f"[{itask}] reloaded task definition")
+ # if itask.state(*TASK_STATUSES_ACTIVE):
+ # LOG.warning(
+ # f"[{itask}] active with pre-reload settings"
+ # )
+ # elif itask.state(TASK_STATUS_PREPARING):
+ # # Job file might have been written at this point?
+ # LOG.warning(
+ # f"[{itask}] may be active with pre-reload settings"
+ # )
diff --git a/cylc/flow/task_proxy.py b/cylc/flow/task_proxy.py
index b5511ffb1..7216993a4 100644
--- a/cylc/flow/task_proxy.py
+++ b/cylc/flow/task_proxy.py
@@ -266,6 +266,20 @@ class TaskProxy:
f" flows:{','.join(str(i) for i in self.flow_nums) or 'none'}"
)
+ def reload(self, tdef):
+ # swap out the task definition
+ self.tdef = tdef
+
+ # recompute derived values
+ self.get_late_time()
+ self.graph_children = generate_graph_children(tdef, self.point)
+
+ self.state.is_queued = False
+ self.state.is_updated = True
Fairly straight forward, this change should simplify things a fair bit. The remaining difficulty is the reloading of outputs / prerequisites which turned out to be too complicated to hash out quickly for 8.0.x (too complicated and too risky too). |
See also #5579, we should probably wait for preparing tasks to clear before pausing the workflow. |
I'm taking a look into flushing out preparing tasks and pausing the workflow pre-reload as a means to fixing #5579. If successful I will punt the remainder of this issue (to do with the interaction of restart and reload) to another issue for later consideration. |
* Closes cylc#5107 * Reload now waits for pending tasks to submit before attempting to reload the config itself. * Reload now also puts the workflow into the paused state during the reload process. This doesn't actually achieve anything as the reload command is blocking in the main loop, but it does help to communicate that the workflow will not de-queue or submit and new tasks during this process. * The workflow status message is now updated to reflect the reload progress.
Bumped the |
* Closes cylc#5107 * Reload now waits for pending tasks to submit before attempting to reload the config itself. * Reload now also puts the workflow into the paused state during the reload process. This doesn't actually achieve anything as the reload command is blocking in the main loop, but it does help to communicate that the workflow will not de-queue or submit and new tasks during this process. * The workflow status message is now updated to reflect the reload progress.
* Closes cylc#5107 * Reload now waits for pending tasks to submit before attempting to reload the config itself. * Reload now also puts the workflow into the paused state during the reload process. This doesn't actually achieve anything as the reload command is blocking in the main loop, but it does help to communicate that the workflow will not de-queue or submit and new tasks during this process. * The workflow status message is now updated to reflect the reload progress.
* Closes cylc#5107 * Reload now waits for pending tasks to submit before attempting to reload the config itself. * Reload now also puts the workflow into the paused state during the reload process. This doesn't actually achieve anything as the reload command is blocking in the main loop, but it does help to communicate that the workflow will not de-queue or submit and new tasks during this process. * The workflow status message is now updated to reflect the reload progress.
* Closes cylc#5107 * Reload now waits for pending tasks to submit before attempting to reload the config itself. * Reload now also puts the workflow into the paused state during the reload process. This doesn't actually achieve anything as the reload command is blocking in the main loop, but it does help to communicate that the workflow will not de-queue or submit and new tasks during this process. * The workflow status message is now updated to reflect the reload progress.
* Closes cylc#5107 * Reload now waits for pending tasks to submit before attempting to reload the config itself. * Reload now also puts the workflow into the paused state during the reload process. This doesn't actually achieve anything as the reload command is blocking in the main loop, but it does help to communicate that the workflow will not de-queue or submit and new tasks during this process. * The workflow status message is now updated to reflect the reload progress.
* Closes cylc#5107 * Reload now waits for pending tasks to submit before attempting to reload the config itself. * Reload now also puts the workflow into the paused state during the reload process. This doesn't actually achieve anything as the reload command is blocking in the main loop, but it does help to communicate that the workflow will not de-queue or submit and new tasks during this process. * The workflow status message is now updated to reflect the reload progress.
re https://cylc.discourse.group/t/task-wrangling-issues/517/8
We need to make it obvious when reloads fail.
I think reload should:
Also, what about restarts? If a server dies and you restart a workflow it's meant to continue unchanged. However, what happens if you've done a reinstall but not reloaded (and perhaps even have a workflow config which does not validate)?
I think restart should always use the previous config.
If this differs from the installed config this should be logged (if possible) but not loaded.
We could add a --reload option to cylc play to support the old behaviour.
The text was updated successfully, but these errors were encountered: