-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shutdown: remove contact file after closing DB connection #4046
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MetRonnie this fix sounds sensible but I can't get the new functional test to fail on master, which makes me wonder if there's any point in having the test. I wonder if there is any way to ensure there is still a connection to the DB at the time...
It fails pretty consistently for me on master locally... might be because our filesystem is so slow 😛 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy with the change.
Would be a good idea to add a comment to the code that removes the contact file highlighting that this should be the last thing the flow should do before shutting down (logging excluded).
If the test is platform dependent, either comment it to explain that or omit it. I think this is just fundamentally a very difficult test to write as the DB closes and reopens with normal usage (horrible) so tracking file handles isn't enough and sampling the filesystem could fail to detect a fleeting re-appearance anyway.
(likely easier to spot on a slow filesystem as unlinking a file is much faster than writing a database).
Perhaps the removal of the contact file should come after this bit too? cylc-flow/cylc/flow/scheduler.py Lines 1677 to 1684 in 9d92abb
That would make it the very last part of |
Hopefully if the test passes on master on your machine then the bug too is very unlikely to affect your machine in the first place. |
I think it makes sense for the shutfown/aborted event handlers to run after the contact file removal as that is the thing that marks the death of the flow as far as the user is concerned. E.G. if your shutdown handler ran |
Instead of vice versa
Addressed and force-pushed |
This is a small change with no associated Issue.
Detecting whether a workflow is still running or has shut down relies on checking for the existence of a contact file. However, before this PR, when shutting down a workflow, the contact file would get removed before shutting the database, which led to the small possiblity of problems (e.g. if the database is deleted in this short time, it would get regenerated).
This PR reverse the order, so any DB connections are definitely closed when the contact file is removed and the workflow is considered to have shut down.
Requirements check-list
CONTRIBUTING.md
and added my name as a Code Contributor.