Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job execution time limit enhancement #1929

Merged
merged 3 commits into from
Jul 25, 2016

Conversation

matthewrmshin
Copy link
Contributor

@matthewrmshin matthewrmshin commented Jul 7, 2016

Modify suite.rc spec

  • New [job] section for task run time
    • execution time limit
    • execution|submission retry delays
    • execution|submission polling intervals
    • batch system name and submit command template
    • shell
  • [event hooks] => [events]

From the new execution time limit setting,

  • Generate time limit directives where actual directive not specified.
  • Generate logic to run background or at jobs with the timeout command.
  • On reaching the time limit, poll job with configurable delays.
    The default delays are 1, 2 and 7 minutes intervals,
    roughly 1, 3 and 10 minutes after reaching time limit.

Close #1718.

@matthewrmshin matthewrmshin added this to the soon milestone Jul 7, 2016
@matthewrmshin matthewrmshin self-assigned this Jul 7, 2016
@matthewrmshin matthewrmshin force-pushed the execution-time-limit branch 4 times, most recently from c7a3e38 to 6c19328 Compare July 8, 2016 12:03
@matthewrmshin matthewrmshin modified the milestones: next release, soon Jul 8, 2016
@matthewrmshin
Copy link
Contributor Author

@hjoliver @arjclark please review.

@@ -11,8 +11,8 @@ title = "User Guide [runtime] example."
OBS:succeed-all => bar"""
[runtime]
[[root]] # base namespace for all tasks (defines suite-wide defaults)
[[[job submission]]]
method = at_now
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was at_now not different from at?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a very old example before at_now was renamed to at (so in theory we can run at with different arguments, now just now).

@arjclark
Copy link
Contributor

arjclark commented Jul 8, 2016

Preliminary review and testing looks ok to me. @cylc/core - please be aware of this change (though old syntax still supported).

@matthewrmshin matthewrmshin force-pushed the execution-time-limit branch 2 times, most recently from 9dc5516 to 855daa4 Compare July 11, 2016 15:02
'execution retry delays': vdr(
vtype='interval_minutes_list', default=[]),
'execution time limit': vdr(
vtype='interval_seconds', default=[]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default shouldn't be a list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change this to have no default, i.e. None.

@matthewrmshin
Copy link
Contributor Author

Fixed typos.

@hjoliver
Copy link
Member

Two tests failing on my laptop VM (no proper batch systems):

./tests/execution-time-limit/01-at.t
./tests/execution-time-limit/00-background.t

with

ok 1 - 00-background-validate
ok 2 - 00-background-run
ok 3 - job-grep-ok
not ok 4 - job.status-grep-ok
ok 5 - job.status-grep-ok

It appears to be because 01/job.status has CYLC_JOB_EXIT=XCPU instead of CYLC_JOB_EXIT=ERR.

@matthewrmshin
Copy link
Contributor Author

The test failure is good. It means your environment is killing the job with the desired signal (XCPU). For some reason, I am unable to get XCPU in my environment. I'll modify the tests to expect either XCPU or ERR.

@matthewrmshin
Copy link
Contributor Author

Test(s) modified.

@dpmatthews dpmatthews self-assigned this Jul 12, 2016
@dpmatthews
Copy link
Contributor

Please give me a chance to comment before merging - thanks

@matthewrmshin matthewrmshin force-pushed the execution-time-limit branch from 81bcb4a to bc610c8 Compare July 12, 2016 13:55
@matthewrmshin
Copy link
Contributor Author

Branch squashed and re-based.

matthewrmshin added a commit to matthewrmshin/rose that referenced this pull request Jul 12, 2016
@hjoliver
Copy link
Member

Review 1: this is good as far as I'm concerned, a nice improvement.

@matthewrmshin matthewrmshin force-pushed the execution-time-limit branch from 3ef01a2 to a1c5be6 Compare July 20, 2016 13:14
@matthewrmshin
Copy link
Contributor Author

Branch re-based. More docs added.

@matthewrmshin
Copy link
Contributor Author

@dpmatthews docs updated.

@dpmatthews
Copy link
Contributor

I haven't reviewed all the documentation changes but the bits I wanted adding are now there - thanks

Modify suite.rc spec
* New [job] section for task run time
  * execution time limit
  * execution|submission retry delays
  * execution|submission polling intervals
  * batch system name and submit command template
  * shell
* [event hooks] => [events]

From the new execution time limit setting,
* Generate time limit directives where actual directive not specified.
* Generate logic to run background or at jobs with the timeout command.
* On reaching the time limit, poll job configurable delays.
  The default is 1, 2 and 7 minutes intervals,
  which is roughly 1, 3 and 10 minutes after reaching time limit.
@matthewrmshin matthewrmshin force-pushed the execution-time-limit branch from a1c5be6 to 99d05ef Compare July 25, 2016 09:00
@matthewrmshin
Copy link
Contributor Author

Branch re-based. Conflicts resolved. Tests OK (with and without global configuration) in my environment.

\lstinline=wall_clock_limit= directive. The setting is assumed to be the soft
limit. The hard limit will be set by adding an extra minute to the soft limit.
Do not specify the \lstinline=wall_clock_limit= directive explicitly if
\lstinline=execution time limit= is specified, or it will cause confusion.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionally, what happens here? (parallel suite hat on here)

@arjclark
Copy link
Contributor

Generally looks good to me, couple of typos and a question but otherwise all good. Test battery passing in my environment.

@matthewrmshin
Copy link
Contributor Author

All doc comments fixed.

@arjclark
Copy link
Contributor

Doc changes look good to me.

@arjclark arjclark merged commit 50cd49e into cylc:master Jul 25, 2016
@matthewrmshin matthewrmshin deleted the execution-time-limit branch July 25, 2016 10:58
matthewrmshin added a commit to matthewrmshin/rose that referenced this pull request Jul 26, 2016
matthewrmshin added a commit to matthewrmshin/cylc-flow that referenced this pull request Jul 27, 2016
Old configuration sections should just be deprecated and removed.
Old items were not upgraded due to new settings already having default
values.
matthewrmshin added a commit to matthewrmshin/cylc-flow that referenced this pull request Jul 27, 2016
Old configuration sections need to be deprecated with no new settings
then removed.
matthewrmshin added a commit to matthewrmshin/cylc-flow that referenced this pull request Jul 27, 2016
Old configuration sections need to be deprecated with no new settings
then removed.
benfitzpatrick added a commit that referenced this pull request Aug 8, 2016
matthewrmshin added a commit to matthewrmshin/rose that referenced this pull request Aug 16, 2016
matthewrmshin added a commit to matthewrmshin/rose that referenced this pull request Aug 22, 2016
oliver-sanders added a commit to metomi/rose that referenced this pull request Aug 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants