Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abort job script with custom message. #2298

Merged
merged 3 commits into from
May 31, 2017

Conversation

hjoliver
Copy link
Member

@hjoliver hjoliver commented May 22, 2017

Close #2274

  • general CRITICAL event handler (c.f. our current WARNING handler) [now in Add CRITICAL event handler. #2304]
  • custom abort-with-message function with message passed to the 'failed' event handler.

Test case to show the abort function works properly with execution retries (abort message is logged each try, but only passed to the failed handler at last try) and compare with trapped exit:

[scheduling]
    [[dependencies]]
        graph = t-abort & t-fail
[runtime]
    [[root]]
        [[[job]]]
            execution retry delays = PT5S
        [[[events]]]
            failed handler = echo "!!!FAILED!!! "
    [[t-abort]]
        script = """
echo ONE; sleep 5
cylc__job_abort "ERROR: rust never sleeps"
sleep 5; echo TWO"""
    [[t-fail]]
        script = """
echo ONE; sleep 5
/bin/false
sleep 5; echo TWO"""

@matthewrmshin - see if you agree with the approach before I and document and add tests. It's more or less in line with our discussion on #2274.

Note slightly unorthodox implementation by means of a shell function that the user has to call in task scripting. Probably should hook this into cylc message or a new command cylc abort? (It would be nice to avoid yet another command, but cylc message doesn't seem very appropriate in this case... cylc message --abort maybe ...)

@hjoliver hjoliver added this to the soon milestone May 22, 2017
@hjoliver hjoliver self-assigned this May 22, 2017
@matthewrmshin
Copy link
Contributor

Some quick answers:

  • I think the approach is OK.
  • The main issue with the abort function is that it also needs to turn off the error traps, etc of the job script, so it has to be done in the same shell.

Some quick questions:

  • If we have configured a failed handler and a critical handler, I guess both will run?
  • I wonder if we can implement the error trap, the vacation trap and this new functionality as one?

@hjoliver hjoliver force-pushed the 2274.custom-failed-msg branch 2 times, most recently from a7b2747 to f6c3405 Compare May 24, 2017 01:29
@hjoliver hjoliver changed the title 2274.custom failed msg Abort job script with custom message. May 24, 2017
@hjoliver hjoliver force-pushed the 2274.custom-failed-msg branch 2 times, most recently from 60bb2bb to 9146b42 Compare May 24, 2017 04:52
@hjoliver
Copy link
Member Author

[critical handler functionality moved to #2304]

@hjoliver
Copy link
Member Author

I wonder if we can implement the error trap, the vacation trap and this new functionality as one?

Done.

@hjoliver hjoliver force-pushed the 2274.custom-failed-msg branch 3 times, most recently from 334878e to 09a09e7 Compare May 24, 2017 05:18
self.CYLC_JOB_EXIT, "EXIT"))
job_status_file.write("%s=%s\n" % (
self.CYLC_JOB_EXIT_MESSAGE,
message.replace(self.ABORT_MESSAGE_PREFIX, "")))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is the right thing to do here? (new CYLC_JOB_EXIT_MESSAGE in status files, just for aborted jobs)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could just set CYLC_JOB_EXIT to the abort message?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just set CYLC_JOB_EXIT to the abort message. If I remember correctly, it can be anything other than SUCCEEDED, ERR and EXIT. cylc jobs-poll uses it as the value of the signal which can be any single-line strings.

Otherwise, if we want to introduce CYLC_JOB_EXIT_MESSAGE, I suppose CYLC_JOB_EXIT should be set to ABORT or ABRT (or something like that), and you may want to ensure that cylc jobs-poll can return it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually used CYLC_JOB_EXIT originally, then changed it. I"ll revert to that.

@hjoliver
Copy link
Member Author

The main issue with the abort function is that it also needs to turn off the error traps, etc of the job script, so it has to be done in the same shell.

Good point - so I left the abort call as a shell function.

@hjoliver hjoliver requested a review from matthewrmshin May 24, 2017 08:44
@hjoliver hjoliver force-pushed the 2274.custom-failed-msg branch from 092888b to e4da521 Compare May 24, 2017 10:06
@hjoliver
Copy link
Member Author

(rebased to remove the CYLC_JOB_EXIT_MESSAGE change).

@hjoliver
Copy link
Member Author

@matthewrmshin - please assign a second reviewer.

Copy link
Contributor

@matthewrmshin matthewrmshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments for the tests.

suite_run_ok $TEST_NAME cylc run --no-detach --debug $SUITE_NAME
#-------------------------------------------------------------------------------
LOG="${SUITE_RUN_DIR}/log/job/1/foo/NN/job-activity.log"
cat "${LOG}" | grep "event-handler" > 'edited-job-activity.log'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can just do:

grep 'event-handler' "${LOG}" > 'edited-job-activity.log'

Copy link
Member Author

@hjoliver hjoliver May 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course (I cut-n-pasted then cut down another test 😬)...

script = """
echo ONE
cylc__job_abort "ERROR: rust never sleeps"
echo TWO"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add a test to ensure job.out only has ONE and not TWO?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spotting - that was my original intention (ran out of time earlier)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@hjoliver
Copy link
Member Author

Feedback addressed. Over 'n out.

@oliver-sanders oliver-sanders merged commit 656faea into cylc:master May 31, 2017
@matthewrmshin matthewrmshin added this to the next release milestone May 31, 2017
@matthewrmshin matthewrmshin removed this from the soon milestone May 31, 2017
@hjoliver hjoliver deleted the 2274.custom-failed-msg branch June 5, 2017 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants