Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TaskTrigger Refactor #2303

Merged
merged 7 commits into from
May 25, 2017
Merged

Conversation

oliver-sanders
Copy link
Member

At the moment taskdefs store triggers in an in-efficient manner:

Scheduler(...).config.taskdef.triggers = {
    <cylc.cycling.SequenceBase>:  [
        [
            [{label: <cylc.task_trigger.TaskTrigger>, ...}, expression], ...
        ], ...
    ], ...
}

In this data structure the task names and qualifiers are stored three times:

expression = 'task_name_colon_succeeded | task_name_colon_failed'
label = 'task_name_colon_succeeded'
task_trigger = TaskTrigger('task_name', qualifier='succeeded', ...)

This pull removes this duplication of information:

  • The expression is now a nested list of TaskTrigger objects and conditional characters e.g. [<TaskTrigger>, '&', <TaskTrigger>]
  • Labels have been retired.

For suites with complex conditional dependencies this has a large effect on memory usage. For the suite mentioned in #2291 the Scheduler object (post configure) goes from 511Mb down to 146Mb, validation shows a 22% reduction (associated with a 3% rise in CPU)

Version Run Elapsed Time (s) CPU Time - Total (s) Max Memory (kb)
master u-al307-validate 554.0 554.2 1584548.0
task-trigger-refactor u-al307-validate 568.7 569.1 1240128.0

For suites with simple dependencies there is a smaller saving. This pull reduces the memory usage of the complex suite by about 4%. The plot below shows the scaling results for the diamond suite:

memory

Changes:

@oliver-sanders oliver-sanders added the efficiency For notable efficiency improvements label May 23, 2017
@oliver-sanders oliver-sanders added this to the next release milestone May 23, 2017
@oliver-sanders oliver-sanders self-assigned this May 23, 2017
@oliver-sanders
Copy link
Member Author

Just to record this information. The main memory users in the SuiteConfig object for the "extremely complex" suite:

Before

404218752 taskdefs
 18233392 cfg
 14393992 pcfg
  2178320 sequences

After

74439888 taskdefs
18233456 cfg
14393992 pcfg
 2177616 sequences

("\+", "_plus_"),
]
# Message trigger offset regex(es).
BCOMPAT_MSG_RE_C5 = re.compile(r'^(.*)\[\s*T\s*(([+-])\s*(\d+))?\s*\](.*)$')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing for message offsets is impacting validation so we should remove this as soon as we are confident that it is no-longer needed. In the mean time can we remove the cylc5 regex?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can remove any cylc-5 back compat code now.

Copy link
Contributor

@matthewrmshin matthewrmshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments.



class ConditionalSimplifier(object):
"""A class to simplify logical expressions"""
RE_CONDITIONALS = "(&|\||\(|\))"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can compile this regular expression?

for message in outputs.values():
if regex.match(message):
raise SuiteConfigError(
'ERROR: Message trigger offsets are obsolete.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can avoid looping message twice?

for message in outputs.values():
    if BCOMPAT_MSG_RE_C5.match(message) or BCOMPAT_MSG_RE_C6.match(message):
       # ...

m = re.match(self.__class__.CYCLE_POINT_RE, message)
if m:
self.target_point_strings.append(m.groups()[0])
match = re.match(self.__class__.CYCLE_POINT_RE, message)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can just do:

match = CYCLE_POINT_RE.match(message)

since CYCLE_POINT_RE is already compiled.


"""
cpre = Prerequisite(point, tdef.start_point)
for task_trigger in self.task_triggers:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block can probably do with some extra comments.

yield (key, re.sub('\[.*\]', str(new_point), msg))
"""Yield task message outputs for initialisation of TaskOutputs."""
for key, msg in self.outputs:
yield (key, re.sub('\[.*\]', str(point), msg))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this substitution?

@oliver-sanders
Copy link
Member Author

Will address the database lock test failures tomorrow.

"""Convert a logical expression in a nested list back to a string"""
flattened = copy.deepcopy(expr)
for i in range(len(flattened)):
if isinstance(flattened[i], list):
flattened[i] = self.flatten_nested_expr(flattened[i])
flattened[i] = cls.flatten_nested_expr(
flattened[i])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spurious change?

Copy link
Contributor

@matthewrmshin matthewrmshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more minor style comments. Change tested as working in my environment.

if lnode.output:
qualifier = TaskTrigger.get_trigger_name(lnode.output)
else:
qualifier = TASK_OUTPUT_SUCCEEDED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

            if outputs and lnode.output in outputs:
                # Task message.
                qualifier = outputs[lnode.output]
            elif lnode.output:
                # Built-in qualifier.
                qualifier = TaskTrigger.get_trigger_name(lnode.output)
            else:
                qualifier = TASK_OUTPUT_SUCCEEDED

A slightly better style? (This lines up all the assignment statements of qualifier.)



class ConditionalSimplifier(object):
"""A class to simplify logical expressions"""
RE_CONDITIONALS = re.compile("(&|\||\(|\))")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The combination backslash escape + pipes (or-logic) + bracket (capture) are making the regular expression very difficult to read. Perhaps better to capture a set in square bracket like this r'([&|()])'?

Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

@hjoliver hjoliver merged commit ca7a953 into cylc:master May 25, 2017
@oliver-sanders oliver-sanders deleted the task-trigger-refactor branch December 14, 2017 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
efficiency For notable efficiency improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants