Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve stall logic in presence of clock trigger #2160

Merged
merged 3 commits into from
Feb 24, 2017

Conversation

matthewrmshin
Copy link
Contributor

Before this change, if the suite has any clock triggered tasks, the
suite will wait for all clock triggered times to reach before considering
the suite as stalled, even if the clock triggered tasks can never be
triggered due to dependencies issues.

With this change, the suite will treat waiting clock triggered tasks
with unsatisfied dependencies the same as other waiting tasks with
unsatisfied dependencies. However, it will consider a clock triggered
task as an active task if it has its dependencies satisfied but is
only waiting for the clock triggered time.

The change allows the stalled event to be raised as soon as the
following conditions are satisfied:

  • There is no activity after one main loop.
  • The task pool is left with only inactive tasks.
    • Active tasks are those that are not pending outputs of other tasks.
  • The task pool has inactive tasks with unsatisfied dependencies.

The problem is reported by @dpmatthews and should be well illustrated by the suite of the new test.

@matthewrmshin matthewrmshin added this to the next release milestone Feb 15, 2017
Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One error in the new test - fails when local day differs from UTC day.

run_fail "${TEST_NAME_BASE}-run" timeout 60 cylc run --debug "${SUITE_NAME}"
sed -n 's/^.* WARNING - //p' "${SUITE_RUN_DIR}/log/suite/log" \
>"${SUITE_RUN_DIR}/log/suite/log.edited"
TODAY="$(date '+%Y%m%d')"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be date -u.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. Now fixed. (It is good to collaborate with colleagues living in the opposite side of the world! This sort of textbook mistakes will be a lot harder to spot otherwise.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timezones are trouble - I'd be happy to do without them and just have lunch at midnight every day 😬

@matthewrmshin
Copy link
Contributor Author

(Branch re-based and squashed.)

@matthewrmshin
Copy link
Contributor Author

(Branch re-based and squashed.)

@arjclark
Copy link
Contributor

@matthewrmshin - I haven't forgotten about this. There's something in the back of my head I'm trying to work out how to test the new logic with and reassure myself about.

Copy link
Contributor

@arjclark arjclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One code tweak requested for efficiency purposes. Otherwise good. Can't externalise the thing I'm concerned about into a test case so if I work it out later then I'll fix it myself.

return False
if ((itask.state.status == TASK_STATUS_WAITING or
itask.state.hold_swap == TASK_STATUS_WAITING) and
itask.state.prerequisites_are_all_satisfied()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the below comments and the purpose of this check, can you tighten up the logic slightly so it'll only carry out the prereq_are_all_satisfied() part of the check for tasks which are clock triggered? We've seen in the cases where there are large numbers of family triggers going into a particular task that cpu can spike on prereq checking so we should only perform the check when needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the logic of the TaskState.prerequisites_are_all_satisfied method, it looks like it is supposed to only recalculate if necessary. The TaskPool.is_stalled method is only called if there has been no change in the main loop (i.e. no state change), so there should be no need for TaskState.prerequisites_are_all_satisfied to recalculate anything.

Unfortunately, on a second closer look at TaskState.prerequisites_are_all_satisfied, the logic appears to be flawed, as I cannot see how self._recalc_satisfied is ever set to False for any tasks with prerequisites. I'll see if I can improve the logic there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Before this change, if the suite has any clock triggered tasks, the
suite will wait for all clock triggered times to reach before considering
the suite as stalled, even if the clock triggered tasks can never be
triggered due to dependencies issues.

With this change, the suite will treat waiting clock triggered tasks
with unsatisfied dependencies the same as other waiting tasks with
unsatisfied dependencies. However, it will consider a clock triggered
task as an *active* task if it has its dependencies satisfied but is
only waiting for the clock triggered time.

The change allows the stalled event to be raised as soon as the
following conditions are satisfied:
* There is no activity after one main loop.
* The task pool is left with only inactive tasks.
  * Active tasks are those that are not pending outputs of other tasks.
* The task pool has inactive tasks with unsatisfied dependencies.

Fix test UTC
*Really* only recalculate if necessary.
@matthewrmshin
Copy link
Contributor Author

(Branch re-based. Tests all good with extra fixes.)

@@ -249,7 +249,6 @@ def __init__(self, status, hold_swap, point, identity, tdef,
self.db_update_status = db_update_status
self.log = log

self._recalc_satisfied = True
self._is_satisfied = False
self._suicide_is_satisfied = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these two now be being initialised to None instead of False?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@arjclark arjclark merged commit 287c9c1 into cylc:master Feb 24, 2017
@matthewrmshin matthewrmshin deleted the improve-stall-logic branch February 24, 2017 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants