Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error handling if exception occurs during shutdown #4168

Merged
merged 6 commits into from
Apr 16, 2021

Conversation

MetRonnie
Copy link
Member

@MetRonnie MetRonnie commented Apr 12, 2021

These changes close #4147

Ensure that if an exception occurs during shutdown, it gets logged before exiting.

This is difficult to test for other than manually; see the steps to reproduce in the issue I've added an integration test, but probably worth manually running the example in #4147 too

Requirements check-list

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Tests included
  • Appropriate change log entry included.
  • No documentation update required.
  • No dependency changes.

@MetRonnie MetRonnie added bug Something is wrong :( small labels Apr 12, 2021
@MetRonnie MetRonnie added this to the cylc-8.0b1 milestone Apr 12, 2021
@MetRonnie MetRonnie self-assigned this Apr 12, 2021
Comment on lines -59 to +60
schd.suite_shutdown = ctrl_c
setattr(schd, 'suite_shutdown', ctrl_c)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This plays better with mypy

@MetRonnie
Copy link
Member Author

MetRonnie commented Apr 13, 2021

Ignore test failures (due to DNS issue). Recommend running pytest locally

Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, tested as working 👍

@MetRonnie MetRonnie removed the request for review from oliver-sanders April 15, 2021 09:33
Copy link
Contributor

@datamel datamel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example of a platform lookup error put in remote_init, higher up the program….

image

This is exactly the same error put in remote_tidy….

image

Is there a reason for leaving traceback in? Surely the same error should produce the same result on screen?

@MetRonnie
Copy link
Member Author

My reasoning is that an error during the running of the workflow can still end with a complete shutdown, but an error during shutdown likely means the shutdown did not complete.

Also, there's already a fair bit of error handling in the Scheduler._shutdown() method, so if it's not caught by that it's probably something serious enough to fail louder?

@datamel
Copy link
Contributor

datamel commented Apr 15, 2021

My reasoning is that an error during the running of the workflow can still end with a complete shutdown, but an error during shutdown likely means the shutdown did not complete.

Ah ok, my reasoning would be if you are raising an error in, for example remote_tidy, that is intentional, and, provided the error message is written correctly, provides sufficient information to the user about the problem.

The reason why I became aware of this bug and raised the issue, was at one stage (before a change implemented in remote_init (#4138)), the error raise in get_random_platform_for_install_target() was not bubbling up correctly in remote_tidy. In that case, there is no need for traceback, the error is sufficient (otherwise presumably we would put traceback on all our error raises).

Also, there's already a fair bit of error handling in the Scheduler._shutdown() method, so if it's not caught by that it's probably something serious enough to fail louder?

From my tests, this error handling will not help here if you look at the call stack, Scheduler._shutdown() happens before remote_tidy is executed.

@MetRonnie
Copy link
Member Author

I've made it so that if a CylcError occurs during shutdown, it just gets a one line log entry, but if a general exception occurs, it logs traceback (this is how uncaught exceptions during the running of the workflow are logged).

@MetRonnie MetRonnie requested a review from hjoliver April 15, 2021 14:08
@MetRonnie
Copy link
Member Author

I have locally run:

  • ✔️ flake8 & mypy
  • ✔️ unit + integration + docstring tests
  • ✔️ functional + flaky tests

@MetRonnie MetRonnie removed the small label Apr 15, 2021
Copy link
Contributor

@datamel datamel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MetRonnie!

@datamel datamel merged commit de8dea3 into cylc:master Apr 16, 2021
@MetRonnie MetRonnie deleted the error-handling branch April 16, 2021 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :(
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Catching and Logging of Errors on shutdown
3 participants