Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shutdown: remove contact file after closing DB connection #4046

Merged
merged 1 commit into from
Jan 27, 2021

Conversation

MetRonnie
Copy link
Member

@MetRonnie MetRonnie commented Jan 25, 2021

This is a small change with no associated Issue.

Detecting whether a workflow is still running or has shut down relies on checking for the existence of a contact file. However, before this PR, when shutting down a workflow, the contact file would get removed before shutting the database, which led to the small possiblity of problems (e.g. if the database is deleted in this short time, it would get regenerated).

This PR reverse the order, so any DB connections are definitely closed when the contact file is removed and the workflow is considered to have shut down.

Requirements check-list

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Appropriate tests are included (functional).
  • Appropriate change log entry included.
  • No documentation update required.
  • No dependency changes.

@MetRonnie MetRonnie added bug? Not sure if this is a bug or not small labels Jan 25, 2021
@MetRonnie MetRonnie added this to the cylc-8.0b0 milestone Jan 25, 2021
@MetRonnie MetRonnie self-assigned this Jan 25, 2021
@MetRonnie MetRonnie marked this pull request as ready for review January 25, 2021 12:49
Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MetRonnie this fix sounds sensible but I can't get the new functional test to fail on master, which makes me wonder if there's any point in having the test. I wonder if there is any way to ensure there is still a connection to the DB at the time...

@MetRonnie
Copy link
Member Author

It fails pretty consistently for me on master locally... might be because our filesystem is so slow 😛

Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with the change.

Would be a good idea to add a comment to the code that removes the contact file highlighting that this should be the last thing the flow should do before shutting down (logging excluded).

If the test is platform dependent, either comment it to explain that or omit it. I think this is just fundamentally a very difficult test to write as the DB closes and reopens with normal usage (horrible) so tracking file handles isn't enough and sampling the filesystem could fail to detect a fleeting re-appearance anyway.

(likely easier to spot on a slow filesystem as unlinking a file is much faster than writing a database).

@MetRonnie
Copy link
Member Author

Perhaps the removal of the contact file should come after this bit too?

# The getattr() calls and if tests below are used in case the
# suite is not fully configured before the shutdown is called.
if getattr(self, "config", None) is not None:
# run shutdown handlers
if isinstance(reason, CylcError):
self.run_event_handlers(self.EVENT_SHUTDOWN, reason.args[0])
else:
self.run_event_handlers(self.EVENT_ABORTED, str(reason))

That would make it the very last part of Scheduler.shutdown()

@MetRonnie
Copy link
Member Author

MetRonnie commented Jan 27, 2021

Hopefully if the test passes on master on your machine then the bug too is very unlikely to affect your machine in the first place.

@oliver-sanders
Copy link
Member

I think it makes sense for the shutfown/aborted event handlers to run after the contact file removal as that is the thing that marks the death of the flow as far as the user is concerned.

E.G. if your shutdown handler ran cylc scan you wouldn't expect the flow that fired the event to appear in the cylc scan results.

@MetRonnie
Copy link
Member Author

Happy with the change.

Would be a good idea to add a comment to the code that removes the contact file highlighting that this should be the last thing the flow should do before shutting down (logging excluded).

If the test is platform dependent, either comment it to explain that or omit it. I think this is just fundamentally a very difficult test to write as the DB closes and reopens with normal usage (horrible) so tracking file handles isn't enough and sampling the filesystem could fail to detect a fleeting re-appearance anyway.

(likely easier to spot on a slow filesystem as unlinking a file is much faster than writing a database).

Addressed and force-pushed

@hjoliver hjoliver merged commit c3ecb20 into cylc:master Jan 27, 2021
@MetRonnie MetRonnie deleted the shutdown branch January 28, 2021 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug? Not sure if this is a bug or not small
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants