Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Branched discussion from kubernetes_asyncio switch - is ThreadPool workaround needed? #567

Closed
consideRatio opened this issue Feb 17, 2022 · 1 comment

Comments

@consideRatio
Copy link
Member

consideRatio commented Feb 17, 2022

TL;DR

I considered if a workaround is still needed, also after kubernetes_asyncio is used to replace kubernetes in #563, and concluded yes it probably is - but it isn't if a property is never read - but that's hard to be confident about.

Background

I'm reviewing #563 that replaces kubernetes with kubernetes_asyncio. Doing that, I'm considering if we should remove a workaround or not.

Regarding this workaround

This is a workaround introduced in #169.

# FIXME: remove when instantiating a kubernetes client
# doesn't create N-CPUs threads unconditionally.
# monkeypatch threadpool in kubernetes api_client
# to avoid instantiating ThreadPools.
# This is known to work for kubernetes-4.0
# and may need updating with later kubernetes clients
_dummy_pool = Mock()
api_client.ThreadPool = lambda *args, **kwargs: _dummy_pool

Do we still need it?

A ThreadPool object was in the past initialized by the kubernetes client without a specified number of threads to be used, which led to a number of threads matching the number of CPUs. Since kubernetes>=9 they initialize the ThreadPool object with 1 thread by default.

image

Does this make us no longer need the patch, or is it problematic that a ThreadPool object is instantiated at all? I think so, but does this change with switching to kubernetes_asyncio?

How does the hack work?

_dummy_pool = Mock()
api_client.ThreadPool = lambda *args, **kwargs: _dummy_pool

This modifies the following code:

    @property
    def pool(self):
        """Create thread pool on first request
         avoids instantiating unused threadpool for blocking clients.
        """
        if self._pool is None:
            atexit.register(self.close)
            self._pool = ThreadPool(self.pool_threads)
        return self._pool

From this, I presume that when self.pool is referenced, a Mock() instance is returned, which just happily accepts everything but does nothing.

Will kubernetes_asyncio use the pool?

Depends, is async_req passed when ApiClient instances calls call_api?

    def call_api(self, resource_path, ...
        if not async_req:
            return self.__call_api(resource_path, ...
        return self.pool.apply_async(self.__call_api, (resource_path, ...

Initial conclusion

I think neither kubernetes or kubernetes_asyncion makes api requests by passing async_req parameters. But I note there is a difference between a __call_api function.

Overall, I conclude we shouldn't have a threadpool, and Mocking it to never initialize a ThreadPool with one thread is fine - because it will never be referenced no matter what.

At the same time, I wonder if a ThreadPool would ever be initialized? What if the pool property isn't accessed at any time? If it isn't, then it won't try create a ThreadPool at all, and it doesn't look to be read in api_client.py for example - if it were with our mock - I think it would error anyhow.

@athornton
Copy link
Contributor

I came to basically the same conclusion. I don't think we need it anymore, but it's super-lightweight and it didn't feel like there was any harm in leaving it around. We should not need a threadpool since we're no longer using Threading at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants