Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post-SoD refactor: remove pflag, runahead pool #4103

Merged
merged 21 commits into from
Jun 1, 2021

Conversation

hjoliver
Copy link
Member

@hjoliver hjoliver commented Mar 1, 2021

This is a small SOD follow-up change (no specific Issue, but has been discussed variously)

E.g., partially addresses #3874:

tasks are spawned "on demand" when ready to run according to the graph. This brings runhead limiting into line with xtriggers, task hold, and queues: it's just another way to temporarily hold back tasks that are otherwise ready to run.
... we should get rid of the hidden runahead pool and unify runahead limiting with the other limiting mechanisms:

  • Make runahead limiting consistent with other forms of limiting (see above)

    • Remove the "runahead task pool". Runahead tasks are now held in the main pool, with an is_runahead task indicator (c.f. is_held, is_queued)
    • Runahead-limited tasks can now be exposed in the UI, like queued and held tasks, to show the user "this task is ready to run but is being held back by the runahead limit (done for cylc tui on this branch).
  • A new "hidden pool" is used to hide partially-satisfied tasks (in order to get spawn-on-demand from spawn-on-outputs)

  • Remove an annoying vestige of the old SOS dependency matching process

    • We used to set task_events_mgr.pflag ("processing required flag") whenever a task event occurred that required a new round of dependency matching, then if the flag was set we would iterate through the task pool and do dependency matching and other things.
    • Generally tidy up and make clearer the aforementioned "other things" which are now done directly without relying on the old flag
  • [UPDATE] restores correct force-triggering behaviour in the presence of queues (broken since spawn-on-demand) implementation: triggering an unqueued tasks queues it; triggering a queued task causes it to run.

cylc tui showing queue- and runahead-limited tasks:
image

Requirements check-list

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Already covered by existing tests.
  • Appropriate change log entry included.
    - (mention visible runahead tasks)
  • (master branch) I have opened a documentation PR at cylc/cylc-doc/pull/XXXX.
    - (mention visible runahead tasks, somewhere...)
  • No dependency changes.

@hjoliver hjoliver added this to the cylc-8.0b0 milestone Mar 1, 2021
@hjoliver hjoliver self-assigned this Mar 1, 2021
@MetRonnie MetRonnie mentioned this pull request Mar 1, 2021
7 tasks
@hjoliver hjoliver force-pushed the pflag-refactor branch 2 times, most recently from d0f978e to e45eacc Compare March 23, 2021 02:12
@hjoliver hjoliver changed the title Post-SoD refactor of task queuing. Post-SoD refactor: remove pflag, runahead pool Mar 23, 2021
@hjoliver
Copy link
Member Author

(Tests passing, after some rebase nightmares; I'll update the description and un-Draft this tomorrow/Wednesday)

@hjoliver hjoliver modified the milestones: cylc-8.0b0, cylc-8.0b1 Mar 23, 2021
@hjoliver hjoliver marked this pull request as ready for review March 23, 2021 22:00
@hjoliver hjoliver modified the milestones: cylc-8.0b1, cylc-8.0b2 Apr 15, 2021
@hjoliver
Copy link
Member Author

(rebased)

@hjoliver hjoliver marked this pull request as draft April 27, 2021 05:33
@hjoliver
Copy link
Member Author

Temporarily converted to Draft, pending fix of post-rebase test fails.

@hjoliver hjoliver marked this pull request as ready for review April 28, 2021 04:44
Copy link
Member

@dwsutherland dwsutherland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

Will test after the rebase, so will review my approval after a play.

@hjoliver
Copy link
Member Author

(feedback addressed, except for one thing to follow up on, and conflicts).

@hjoliver
Copy link
Member Author

(There are no conflicts on this branch right now so long as you don't select "Rebase and merge".)

Copy link
Member

@MetRonnie MetRonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9/29 files viewed. Seems to be a couple of unresolved point from earlier

Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some quick changes that will reduce the impact of task iteration, have dumped them in a PR:

hjoliver#16 (review)

Comment on lines -715 to -719
if released:
LOG.debug(
"Queue released:\n"
+ '\n'.join(f"* {r.identity}" for r in released)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was logging to "debug" as a bullet-point list, now logging each task individually to "info"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This involves less iteration, since we're already iterating over the list for the preceding lines. Do you have a strong preference either for debug vs info on this? (It's debatable IMO).

"""
self.release_runahead_tasks()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unexpected side-effect?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't follow?

This is here in case there are runahead-limited tasks that could be released (because they are now below the runahead limit) at the time of stall check, in which case, we're not stalled. Unfortunately I don't recall why I thought that was necessary at this point. ... I'll go back and see if I can see why.

Comment on lines 1427 to 1428
if x.state(TASK_STATUS_WAITING)
and not x.state.is_queued
Copy link
Member

@oliver-sanders oliver-sanders May 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two filters are applied to these three iterations of the task pool so there is scope to reduce the amount of iteration here without making structural changes to the pool.

Haven't put this in my PR incase there's some reason not to, tasks changing state between each iteration? High memory watermark?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason that I can think of. There shouldn't be state changes between each iteration. I've extensively refactored this bit along the lines you've suggested. See what you think.

task iteration improvements
@@ -672,8 +666,7 @@ def _load_pool_from_point(self):
parent_points = tdef.get_parent_points(point)
if not parent_points or all(
x < self.config.start_point for x in parent_points):
initial_tasks.append(TaskID.get(tdef.name, point))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Not used anywhere).

@hjoliver hjoliver force-pushed the pflag-refactor branch 4 times, most recently from 0f42c2b to 8281420 Compare May 28, 2021 04:40
@hjoliver
Copy link
Member Author

hjoliver commented May 28, 2021

Apologies reviewers! In addition to my several embarrassing crimes again good iterative form (hopefully I would have found the worst of those myself if I had reviewed my own code properly!) ... there were some TODOs indicating unfinished work on this branch, which I had neglected to come back to. I have now done that, plus addressed all review comments, plus this:

  • restored correct force-triggering behaviour in the presence of queues (broken since spawn-on-demand) implementation: triggering an unqueued task should queue it; triggering a queued task should cause it to run.

Hopefully good to go now 🤞

It seems we must not have tests for force-triggering with queues - punting that to #4234 as I've run out of time today.

Comment on lines 305 to 316
for itask in (
itask
for cycle in self.main_pool.values()
for itask in cycle.values()
if itask.state.is_runahead
if itask.state(
TASK_STATUS_FAILED,
TASK_STATUS_SUCCEEDED,
TASK_STATUS_EXPIRED
)
or itask.is_manual_submit
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not very readable in my opinion. Don't know if you want to address it now or in a later follow-up

Copy link
Member Author

@hjoliver hjoliver Jun 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used to agree, although I've pretty much got my eye in for these constructs now. In any case, you can blame @oliver-sanders ' side-PR for this one! Oliver, why exactly do you prefer this kind of thing over nested loops with conditional blocks in them? I doubt that comprehensions are any more efficient unless they only involve basic types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(However, the variable cycle in this one is badly named, I'll fix that).

Copy link
Member

@oliver-sanders oliver-sanders Jun 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used to agree

Ditto, however, once I got used to comprehensions I changed my mind (a short stint of Erlang is an effective conversion, there is no alternative!), I think they are more readable and the filtering is much clearer (especially than the if <cond>: continue pattern (negative logic etc).

@hjoliver
Copy link
Member Author

hjoliver commented Jun 1, 2021

Three approvals should do it!

@hjoliver hjoliver merged commit bdb51e2 into cylc:master Jun 1, 2021
@hjoliver hjoliver deleted the pflag-refactor branch June 1, 2021 03:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants