-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python freeze/hang on exit #411
Comments
This seems to be related to the This can cause a deadlock when the api clients are garbage collected as Python exits. I can reproduce with the following: # deadlock.py
from multiprocessing.pool import ThreadPool
class Deadlocker:
def __init__(self):
self.pool = ThreadPool()
def __del__(self):
self.pool.close()
self.pool.join()
d1 = Deadlocker()
d2 = Deadlocker()
print('exiting...') and running:
On macOS 10.12.6 with Python 3.6.3, after 1-50 executions, it will print "exiting..." and stall. The python process won't ever terminate until you hit ctrl-c. I can also reproduce on Linux but it seems to be less frequent that on macOS. This means a simple script like this: from kubernetes import client
coreapi = client.CoreV1Api()
batchapi = client.BatchV1Api() may never terminate because I wonder whether the
|
I'm seeing what I suspect is another symptom of this problem. When I use the
It doesn't seem to cause any harm, since as the message says, the exception is ignored since it's from the I didn't see this symptom when using version |
I think the hang happens with the I introduced the @RobbieClarken You are correct on the ThreadPool's bad documentation and all. I googled a lot to figure out how to "correctly" deal with a threadpool instance. @jphx a check can be done to see if the object is not none. Ugly but should work. |
…#813) * Add option to use kubectl to work around kubernetes-client/python#411 * Use image built from previous commit in cronjob yaml * Optimize by reducing api calls and other minor tweaks * Use image built from previous commit in block-aws-cron.yaml
Is there a recommended workaround for this issue? Currently I'm pursuing the approach of deploying a modified api_client.py with the pool code disabled. Perhaps lazily creating the thread pool only when some call occurs with async=True would be a better approach than always creating the pool? While doing so doesn't necessarily resolve the bug, it would at least confine the bug to those using async=True, and prevent unnecessary churn of thread pool objects for those not using async=True. |
@johnmarcou @RobbieClarken I'm having exactly the same situation as @sbconsulting and I'm trying to find out a workaround to this issue. Would you mind sharing some code that you are using to temporally patch this issue? I can see that John reported that they override the APIClient class and disabled the threadpool feature, but would be good to know how you did it in case other people have the same issue as all us. Anyways, thanks a lot for the discussion in this issue, I've been a week hunting this ghost bug! |
Hi @Sturgelose, The client.CoreV1Api() object use dependencies injection to get its ApiClient. To create our own K8sApiClient, we are using inheritance of the ApiClient, then we override some functions. We are talking about this Class: Here comes the patched K8sApiClient:
Then how to use it:
It's just some copy/paste from a project, it's not tested code. But you should have everything to disable this ThreadPool issue. |
This is broken for me as well. Seems like swagger-api/swagger-codegen#8061 would fix it, since it only instantiates the problematic |
The kubernetes python client has a bug [1] which results in frequent deadlocks while being cleaned up, which causes armada to hang at the end of execution. This patchset works around that issue by mocking out the associated thread pools, since they are only needed for async kubernetes api calls, which armada does not use. [1]: kubernetes-client/python#411 Change-Id: I71fbfbe355347ae2ddd02ffd26d881368320246b
I'm wondering what the state of this is. Given that there's a series of MRs, some open and some merged, and that this issue is itself open, would anyone be willing to provide a general summary of the path forward, and what versions of the client will have a fix? |
@spacez320 The workaround was added and it's available in the latest version 9.0.0a1. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@spacez320 could you please verify if the workaround works? |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-lifecycle rotten |
/reopen |
@furkanmustafa: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
related #422 |
Context
Hello,
In a batch manager project, we are using this python client to submit jobs to the Kubernetes API.
In one word, the python application loads the library, submit the job, watch the related events, clean/delete the job, then return the succeeded or failed status. But sometimes, the application hang at the exit.
After investigation, it seems the ThreadPool used in the ApiClient class is not properly clean on the python process exit.
Reproduce
The easiest way to reproduce is to run this snippet:
This will run Python in a Docker container, install the Kubernetes python module, then run the test indefinitely. The test starts a simple application 50 times in order to increase the probability. This application loads the Kubernetes python module, create a CoreV1Api, which creates its ApiClient (with Async enabled using ThreadPool), then print
0
showing the freeze occured during the python exit sequence.To stop the test:
Expected:
This code should run indefinitely.
Result:
The loop hang on list of
0
after some time.Workaround:
To avoid this issue, we override the ApiClient class to disable Async / ThreadPool feature. It seems to work without any issues so far. Downside is we are loosing the Async mode.
Thank you.
The text was updated successfully, but these errors were encountered: