[PR] Prevent thread leakage on exiting #326
Labels
archive
bug
Something isn't working
enhancement
New feature or request
refactoring
Code cleanup without new features added
Milestone
What do these changes do?
Prevent leakage of OS resources (threads) on operator existing, which can make it irresponsive after the main function is finished and no logging is happening (if a sync-handler hangs).
Description
All sync-handlers (declared with
def
, notasync def
) are executed in thread pools. The operator itself is fully asynchronous. Whenever operator is exiting or is being cancelled, it also cancels all its coroutines. However, if a coroutine is cancelled atloop.run_in_executor()
, the coroutine indeed exits, but the thread it used continues running — until finished. As a result, some threads can continue running even after the operator is kind-of-exited normally.This becomes critical as long-running handlers are coming into play (i.e. daemons). But it is a change of its own value even now — hence, it is extracted for easier reviewing.
Instead, with this PR, the operator will wait for such threads to finish (potentially forever or until SIGKILL'ed, as Kubernetes does after 30s). And while waiting, it will log the status of the operator's tasks and what exactly it waits for. So, it becomes visible in the logs which handler hangs or prevents operator from exiting properly. In case everything exits instantly, there is nothing to wait, nothing to log, so it exits as usually.
In addition, the lifecycle functions (which select the handlers to be processed) are now called directly to avoid using thread pools. We know in advance that the lifecycle functions are safe and fast (instant), so there is no need to overcomplicate them.
Issues/PRs
Type of changes
Checklist
CONTRIBUTORS.txt
The text was updated successfully, but these errors were encountered: