-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve stall logic in presence of clock trigger #2160
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One error in the new test - fails when local day differs from UTC day.
run_fail "${TEST_NAME_BASE}-run" timeout 60 cylc run --debug "${SUITE_NAME}" | ||
sed -n 's/^.* WARNING - //p' "${SUITE_RUN_DIR}/log/suite/log" \ | ||
>"${SUITE_RUN_DIR}/log/suite/log.edited" | ||
TODAY="$(date '+%Y%m%d')" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be date -u
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. Now fixed. (It is good to collaborate with colleagues living in the opposite side of the world! This sort of textbook mistakes will be a lot harder to spot otherwise.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timezones are trouble - I'd be happy to do without them and just have lunch at midnight every day 😬
8d09efc
to
a728932
Compare
(Branch re-based and squashed.) |
a728932
to
d5a512f
Compare
(Branch re-based and squashed.) |
d5a512f
to
832e361
Compare
@matthewrmshin - I haven't forgotten about this. There's something in the back of my head I'm trying to work out how to test the new logic with and reassure myself about. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One code tweak requested for efficiency purposes. Otherwise good. Can't externalise the thing I'm concerned about into a test case so if I work it out later then I'll fix it myself.
return False | ||
if ((itask.state.status == TASK_STATUS_WAITING or | ||
itask.state.hold_swap == TASK_STATUS_WAITING) and | ||
itask.state.prerequisites_are_all_satisfied()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the below comments and the purpose of this check, can you tighten up the logic slightly so it'll only carry out the prereq_are_all_satisfied() part of the check for tasks which are clock triggered? We've seen in the cases where there are large numbers of family triggers going into a particular task that cpu can spike on prereq checking so we should only perform the check when needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the logic of the TaskState.prerequisites_are_all_satisfied
method, it looks like it is supposed to only recalculate if necessary. The TaskPool.is_stalled
method is only called if there has been no change in the main loop (i.e. no state change), so there should be no need for TaskState.prerequisites_are_all_satisfied
to recalculate anything.
Unfortunately, on a second closer look at TaskState.prerequisites_are_all_satisfied
, the logic appears to be flawed, as I cannot see how self._recalc_satisfied
is ever set to False
for any tasks with prerequisites. I'll see if I can improve the logic there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Before this change, if the suite has any clock triggered tasks, the suite will wait for all clock triggered times to reach before considering the suite as stalled, even if the clock triggered tasks can never be triggered due to dependencies issues. With this change, the suite will treat waiting clock triggered tasks with unsatisfied dependencies the same as other waiting tasks with unsatisfied dependencies. However, it will consider a clock triggered task as an *active* task if it has its dependencies satisfied but is only waiting for the clock triggered time. The change allows the stalled event to be raised as soon as the following conditions are satisfied: * There is no activity after one main loop. * The task pool is left with only inactive tasks. * Active tasks are those that are not pending outputs of other tasks. * The task pool has inactive tasks with unsatisfied dependencies. Fix test UTC
*Really* only recalculate if necessary.
832e361
to
3be6aea
Compare
(Branch re-based. Tests all good with extra fixes.) |
lib/cylc/task_state.py
Outdated
@@ -249,7 +249,6 @@ def __init__(self, status, hold_swap, point, identity, tdef, | |||
self.db_update_status = db_update_status | |||
self.log = log | |||
|
|||
self._recalc_satisfied = True | |||
self._is_satisfied = False | |||
self._suicide_is_satisfied = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these two now be being initialised to None instead of False?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Before this change, if the suite has any clock triggered tasks, the
suite will wait for all clock triggered times to reach before considering
the suite as stalled, even if the clock triggered tasks can never be
triggered due to dependencies issues.
With this change, the suite will treat waiting clock triggered tasks
with unsatisfied dependencies the same as other waiting tasks with
unsatisfied dependencies. However, it will consider a clock triggered
task as an active task if it has its dependencies satisfied but is
only waiting for the clock triggered time.
The change allows the stalled event to be raised as soon as the
following conditions are satisfied:
The problem is reported by @dpmatthews and should be well illustrated by the suite of the new test.