Skip to content
This repository has been archived by the owner on Sep 14, 2020. It is now read-only.

Prevent thread leakage on exiting #326

Merged
merged 4 commits into from
Mar 10, 2020

Conversation

nolar
Copy link
Contributor

@nolar nolar commented Mar 9, 2020

What do these changes do?

Prevent leakage of OS resources (threads) on operator existing, which can make it irresponsive after the main function is finished and no logging is happening (if a sync-handler hangs).

Description

All sync-handlers (declared with def, not async def) are executed in thread pools. The operator itself is fully asynchronous. Whenever operator is exiting or is being cancelled, it also cancels all its coroutines. However, if a coroutine is cancelled at loop.run_in_executor(), the coroutine indeed exits, but the thread it used continues running — until finished. As a result, some threads can continue running even after the operator is kind-of-exited normally.

This becomes critical as long-running handlers are coming into play (i.e. daemons). But it is a change of its own value even now — hence, it is extracted for easier reviewing.

Instead, with this PR, the operator will wait for such threads to finish (potentially forever or until SIGKILL'ed, as Kubernetes does after 30s). And while waiting, it will log the status of the operator's tasks and what exactly it waits for. So, it becomes visible in the logs which handler hangs or prevents operator from exiting properly. In case everything exits instantly, there is nothing to wait, nothing to log, so it exits as usually.

In addition, the lifecycle functions (which select the handlers to be processed) are now called directly to avoid using thread pools. We know in advance that the lifecycle functions are safe and fast (instant), so there is no need to overcomplicate them.

Issues/PRs

Issues: #19

Type of changes

  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Refactoring (non-breaking change which does not alter the behaviour)

Checklist

  • The code addresses only the mentioned problem, and this problem only
  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes
  • If you provide code modification, please add yourself to CONTRIBUTORS.txt

nolar added 4 commits March 9, 2020 12:39
The lifecycle functions should not be called in threads. They are
safe enough to be called directly in the asyncio event loop, thus
saving the OS resources (threads in asyncio's thread pools).
@nolar nolar added bug Something isn't working enhancement New feature or request refactoring Code cleanup without new features added labels Mar 9, 2020
@zincr
Copy link

zincr bot commented Mar 9, 2020

🤖 zincr found 0 problems , 0 warnings

✅ Large Commits
✅ Approvals
✅ Specification
✅ Dependency Licensing

@zincr
Copy link

zincr bot commented Mar 9, 2020

🤖 zincr found 1 problem , 0 warnings

❌ Approvals
✅ Large Commits
✅ Specification
✅ Dependency Licensing

Details on how to resolve are provided below


Approvals

All proposed changes must be reviewed by project maintainers before they can be merged

Not enough people have approved this pull request - please ensure that 1 additional user, who have not contributed to this pull request approve the changes.

  • ✅ Approved by PR author @nolar
  • ❌ 1 additional approval needed
     

2 similar comments
@zincr
Copy link

zincr bot commented Mar 9, 2020

🤖 zincr found 1 problem , 0 warnings

❌ Approvals
✅ Large Commits
✅ Specification
✅ Dependency Licensing

Details on how to resolve are provided below


Approvals

All proposed changes must be reviewed by project maintainers before they can be merged

Not enough people have approved this pull request - please ensure that 1 additional user, who have not contributed to this pull request approve the changes.

  • ✅ Approved by PR author @nolar
  • ❌ 1 additional approval needed
     

@zincr
Copy link

zincr bot commented Mar 9, 2020

🤖 zincr found 1 problem , 0 warnings

❌ Approvals
✅ Large Commits
✅ Specification
✅ Dependency Licensing

Details on how to resolve are provided below


Approvals

All proposed changes must be reviewed by project maintainers before they can be merged

Not enough people have approved this pull request - please ensure that 1 additional user, who have not contributed to this pull request approve the changes.

  • ✅ Approved by PR author @nolar
  • ❌ 1 additional approval needed
     

@nolar nolar merged commit 651260b into zalando-incubator:master Mar 10, 2020
@nolar nolar deleted the no-thread-leaks branch March 10, 2020 14:59
@nolar nolar added this to the 0.27 milestone May 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working enhancement New feature or request refactoring Code cleanup without new features added
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants