-
Notifications
You must be signed in to change notification settings - Fork 87
Conversation
The lifecycle functions should not be called in threads. They are safe enough to be called directly in the asyncio event loop, thus saving the OS resources (threads in asyncio's thread pools).
🤖 zincr found 0 problems , 0 warnings
|
🤖 zincr found 1 problem , 0 warnings
Details on how to resolve are provided below ApprovalsAll proposed changes must be reviewed by project maintainers before they can be merged Not enough people have approved this pull request - please ensure that 1 additional user, who have not contributed to this pull request approve the changes.
|
2 similar comments
🤖 zincr found 1 problem , 0 warnings
Details on how to resolve are provided below ApprovalsAll proposed changes must be reviewed by project maintainers before they can be merged Not enough people have approved this pull request - please ensure that 1 additional user, who have not contributed to this pull request approve the changes.
|
🤖 zincr found 1 problem , 0 warnings
Details on how to resolve are provided below ApprovalsAll proposed changes must be reviewed by project maintainers before they can be merged Not enough people have approved this pull request - please ensure that 1 additional user, who have not contributed to this pull request approve the changes.
|
What do these changes do?
Prevent leakage of OS resources (threads) on operator existing, which can make it irresponsive after the main function is finished and no logging is happening (if a sync-handler hangs).
Description
All sync-handlers (declared with
def
, notasync def
) are executed in thread pools. The operator itself is fully asynchronous. Whenever operator is exiting or is being cancelled, it also cancels all its coroutines. However, if a coroutine is cancelled atloop.run_in_executor()
, the coroutine indeed exits, but the thread it used continues running — until finished. As a result, some threads can continue running even after the operator is kind-of-exited normally.This becomes critical as long-running handlers are coming into play (i.e. daemons). But it is a change of its own value even now — hence, it is extracted for easier reviewing.
Instead, with this PR, the operator will wait for such threads to finish (potentially forever or until SIGKILL'ed, as Kubernetes does after 30s). And while waiting, it will log the status of the operator's tasks and what exactly it waits for. So, it becomes visible in the logs which handler hangs or prevents operator from exiting properly. In case everything exits instantly, there is nothing to wait, nothing to log, so it exits as usually.
In addition, the lifecycle functions (which select the handlers to be processed) are now called directly to avoid using thread pools. We know in advance that the lifecycle functions are safe and fast (instant), so there is no need to overcomplicate them.
Issues/PRs
Type of changes
Checklist
CONTRIBUTORS.txt