[PR] Prevent thread leakage on exiting #326

kopf-archiver · 2020-08-18T20:03:47Z

A pull request by nolar at 2020-03-09 18:39:52+00:00
Original URL: zalando-incubator/kopf#326
Merged by nolar at 2020-03-10 14:58:59+00:00

What do these changes do?

Prevent leakage of OS resources (threads) on operator existing, which can make it irresponsive after the main function is finished and no logging is happening (if a sync-handler hangs).

Description

All sync-handlers (declared with def, not async def) are executed in thread pools. The operator itself is fully asynchronous. Whenever operator is exiting or is being cancelled, it also cancels all its coroutines. However, if a coroutine is cancelled at loop.run_in_executor(), the coroutine indeed exits, but the thread it used continues running — until finished. As a result, some threads can continue running even after the operator is kind-of-exited normally.

This becomes critical as long-running handlers are coming into play (i.e. daemons). But it is a change of its own value even now — hence, it is extracted for easier reviewing.

Instead, with this PR, the operator will wait for such threads to finish (potentially forever or until SIGKILL'ed, as Kubernetes does after 30s). And while waiting, it will log the status of the operator's tasks and what exactly it waits for. So, it becomes visible in the logs which handler hangs or prevents operator from exiting properly. In case everything exits instantly, there is nothing to wait, nothing to log, so it exits as usually.

In addition, the lifecycle functions (which select the handlers to be processed) are now called directly to avoid using thread pools. We know in advance that the lifecycle functions are safe and fast (instant), so there is no need to overcomplicate them.

Issues/PRs

Issues: #19

Type of changes

New feature (non-breaking change which adds functionality)
Bug fix (non-breaking change which fixes an issue)
Refactoring (non-breaking change which does not alter the behaviour)

Checklist

The code addresses only the mentioned problem, and this problem only
I think the code is well written
Unit tests for the changes exist
Documentation reflects the changes
If you provide code modification, please add yourself to CONTRIBUTORS.txt

The text was updated successfully, but these errors were encountered:

kopf-archiver bot added the archive label Aug 18, 2020

kopf-archiver bot closed this as completed Aug 18, 2020

kopf-archiver bot changed the title ~~[archival placeholder]~~ [PR] Prevent thread leakage on exiting Aug 19, 2020

kopf-archiver bot added bug Something isn't working enhancement New feature or request refactoring Code cleanup without new features added labels Aug 19, 2020

kopf-archiver bot added this to the 0.27: daemons, timers, configs, storages milestone Aug 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PR] Prevent thread leakage on exiting #326

[PR] Prevent thread leakage on exiting #326

kopf-archiver bot commented Aug 18, 2020 •

edited

Loading

[PR] Prevent thread leakage on exiting #326

[PR] Prevent thread leakage on exiting #326

Comments

kopf-archiver bot commented Aug 18, 2020 • edited Loading

What do these changes do?

Description

Issues/PRs

Type of changes

Checklist

kopf-archiver bot commented Aug 18, 2020 •

edited

Loading