Prevent thread leakage on exiting #326

nolar · 2020-03-09T18:39:52Z

What do these changes do?

Prevent leakage of OS resources (threads) on operator existing, which can make it irresponsive after the main function is finished and no logging is happening (if a sync-handler hangs).

Description

All sync-handlers (declared with def, not async def) are executed in thread pools. The operator itself is fully asynchronous. Whenever operator is exiting or is being cancelled, it also cancels all its coroutines. However, if a coroutine is cancelled at loop.run_in_executor(), the coroutine indeed exits, but the thread it used continues running — until finished. As a result, some threads can continue running even after the operator is kind-of-exited normally.

This becomes critical as long-running handlers are coming into play (i.e. daemons). But it is a change of its own value even now — hence, it is extracted for easier reviewing.

Instead, with this PR, the operator will wait for such threads to finish (potentially forever or until SIGKILL'ed, as Kubernetes does after 30s). And while waiting, it will log the status of the operator's tasks and what exactly it waits for. So, it becomes visible in the logs which handler hangs or prevents operator from exiting properly. In case everything exits instantly, there is nothing to wait, nothing to log, so it exits as usually.

In addition, the lifecycle functions (which select the handlers to be processed) are now called directly to avoid using thread pools. We know in advance that the lifecycle functions are safe and fast (instant), so there is no need to overcomplicate them.

Issues/PRs

Issues: #19

Type of changes

New feature (non-breaking change which adds functionality)
Bug fix (non-breaking change which fixes an issue)
Refactoring (non-breaking change which does not alter the behaviour)

Checklist

The code addresses only the mentioned problem, and this problem only
I think the code is well written
Unit tests for the changes exist
Documentation reflects the changes
If you provide code modification, please add yourself to CONTRIBUTORS.txt

The lifecycle functions should not be called in threads. They are safe enough to be called directly in the asyncio event loop, thus saving the OS resources (threads in asyncio's thread pools).

zincr · 2020-03-09T18:40:03Z

🤖 zincr found 0 problems , 0 warnings

✅ Large Commits
✅ Approvals
✅ Specification
✅ Dependency Licensing

zincr · 2020-03-09T18:40:03Z

🤖 zincr found 1 problem , 0 warnings

❌ Approvals
✅ Large Commits
✅ Specification
✅ Dependency Licensing

Details on how to resolve are provided below

Approvals

All proposed changes must be reviewed by project maintainers before they can be merged

Not enough people have approved this pull request - please ensure that 1 additional user, who have not contributed to this pull request approve the changes.

✅ Approved by PR author @nolar
❌ 1 additional approval needed

zincr · 2020-03-09T18:40:03Z

🤖 zincr found 1 problem , 0 warnings

❌ Approvals
✅ Large Commits
✅ Specification
✅ Dependency Licensing

Details on how to resolve are provided below

Approvals

All proposed changes must be reviewed by project maintainers before they can be merged

Not enough people have approved this pull request - please ensure that 1 additional user, who have not contributed to this pull request approve the changes.

✅ Approved by PR author @nolar
❌ 1 additional approval needed

zincr · 2020-03-09T18:40:04Z

🤖 zincr found 1 problem , 0 warnings

❌ Approvals
✅ Large Commits
✅ Specification
✅ Dependency Licensing

Details on how to resolve are provided below

Approvals

All proposed changes must be reviewed by project maintainers before they can be merged

Not enough people have approved this pull request - please ensure that 1 additional user, who have not contributed to this pull request approve the changes.

✅ Approved by PR author @nolar
❌ 1 additional approval needed

nolar added 4 commits March 9, 2020 12:39

Call handler-selecting callbacks directly without threads

a3ffcb9

The lifecycle functions should not be called in threads. They are safe enough to be called directly in the asyncio event loop, thus saving the OS resources (threads in asyncio's thread pools).

Prevent OS thread leakage when cancelling/stopping the operator

b0aff30

Log regular attempts to stop the hung tasks or threads when exiting

4c489a6

Log exceptions in the root tasks -- with stacktraces

7f48b5e

nolar added bug Something isn't working enhancement New feature or request refactoring Code cleanup without new features added labels Mar 9, 2020

nolar requested review from 3abdelazim, aweller and haikoschol March 9, 2020 18:40

haikoschol approved these changes Mar 10, 2020

View reviewed changes

nolar merged commit 651260b into zalando-incubator:master Mar 10, 2020

nolar deleted the no-thread-leaks branch March 10, 2020 14:59

nolar added this to the 0.27 milestone May 11, 2020

kopf-archiver bot mentioned this pull request Aug 19, 2020

[PR] Prevent thread leakage on exiting nolar/kopf#326

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent thread leakage on exiting #326

Prevent thread leakage on exiting #326

nolar commented Mar 9, 2020 •

edited

Loading

zincr bot commented Mar 9, 2020 •

edited

Loading

zincr bot commented Mar 9, 2020

zincr bot commented Mar 9, 2020

zincr bot commented Mar 9, 2020

Prevent thread leakage on exiting #326

Prevent thread leakage on exiting #326

Conversation

nolar commented Mar 9, 2020 • edited Loading

What do these changes do?

Description

Issues/PRs

Type of changes

Checklist

zincr bot commented Mar 9, 2020 • edited Loading

🤖 zincr found 0 problems , 0 warnings

zincr bot commented Mar 9, 2020

🤖 zincr found 1 problem , 0 warnings

Approvals

zincr bot commented Mar 9, 2020

🤖 zincr found 1 problem , 0 warnings

Approvals

zincr bot commented Mar 9, 2020

🤖 zincr found 1 problem , 0 warnings

Approvals

nolar commented Mar 9, 2020 •

edited

Loading

zincr bot commented Mar 9, 2020 •

edited

Loading