Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rely on the event loop: use kubernetes_asyncio instead of kubernetes and dedicated threads #563

Merged
merged 63 commits into from
Feb 22, 2022
Merged

Conversation

athornton
Copy link
Contributor

@athornton athornton commented Feb 4, 2022

Background

Followup PRs

Change description

No dedicated threads -> async/await with the event loop
KubeSpawner has relied on kubernetes-client/python (aka. kubernetes on PyPI) to speak with the k8s api-server to watch, create, delete pods etc. For KubeSpawner to function, it has what we call reflectors watching for what is going on in the k8s cluster. These reflectors has previously been running in dedicated threads, but are via this PR scheduled to run via the event loop with async/await.

Replaced dependency: kubernetes -> kubernetes_asyncio
For this to be possible, we have switched from kubernetes-client/python (aka. kubernetes on PyPI) to tomplus/kubernetes_asyncio (aka. kubernetes_asyncio on PyPI).

Benefits of the change

Reduction of transient k8s errors
@athornton had observed transient errors when spawning user pods. The errors were showing up as k8s Event's emitted that could be seen via kubectl describe pod <name> and said for example "failed to sync configmap cache" or described failures to mount a volume.

With this change, @athornton observed those errors reduced significantly.

[...] I think that's because coroutines do a much better job of getting fast-to-create K8s objects (like secrets and configmaps) actually built before something tries to operate on them than a bunch of threads hammering willy-nilly on the control plane did. That, however, is closer to a guess than a well-instrumented theory.

Reducing codebase complexity
While KubeSpawner's codebase will remain quite complicated due to its quite complicated task, its complexity can be reduced notably by not using multiple threads.

Indications of slightly improved performance
Load tests seem to provide a similar or slightly improved performance.

Technical notes

A strategy to await required async object initialization
An object constructor (__init__) can't be async, but we needed to await some initialization logic before JupyterHub that created a KubeSpawner object would start using its public methods. To do this we schedule the async init logic (_async_init()) in the constructor and then ensure the public functions JupyterHub may call, that requires the async init to finish, would ensure to await it by using a decorator (_await_async_init).

Breaking changes

  • Dropped support for Python 3.6
  • The configuration k8s_api_threadpool_workers is deprecated as we don't create any threads any more, but instead relies on scheduling everything to run in the event loop.
  • Changed dependencies: kubernetes dropped, kubernetes_asyncio introduced.
  • This project includes KubeSpawner and KubeIngressProxy, both influenced by this PR, but only KubeSpawner has been properly tested, so maybe KubeIngressProxy got broken unintentionally.

This PR summary is written by @consideRatio

@athornton
Copy link
Contributor Author

This time, I meant to PR it upstream.

@consideRatio
Copy link
Member

@athornton can you put in work on a thorough description for this very significant change? You overview this change suggestion far better than any one else can, and it saves so much effort on the reviewer part if some of that can be conveyed by text ahead of technical review.

  • What motivated this change and what related discussions are there?
  • What are the major things done
  • What do you expect to be breaking
  • Do you expect issues about something for end users?
  • What to focus extra review attention to
  • etc.

These kinds of things would be relevant to write about for example. If you want to focus on technical aspects for a while longer, mark this as a draft PR!

@athornton
Copy link
Contributor Author

athornton commented Feb 4, 2022

Absolutely! I'd be delighted to talk about what, conceptually, this patch is doing.

The major motivation for us, the Kubespawner users at Rubin Observatory, using it in our Science Platform's Notebook Aspect, is that we are seeing a lot of (usually transient) errors when spawning with Kubespawner. Usually they are of the form "failed to sync configmap cache", or failed mountpoints (usually secrets or configmaps, sometimes real NFS volumes). These mostly seem to resolve (so users do eventually get their pods). Nevertheless, we've been trying to eliminate spurious errors in our logs so that the remaining logs are meaningful. While attempting that, we realized that understanding what was happening in a threaded Kubespawner was really, really difficult.

So I undertook to make Kubespawner asyncio/coroutine-based, using tomplus/kubernetes_asyncio, because it would be a lot easier for me to reason about. Coroutines with fairly explicit points where control is yielded are something I find a lot easier to keep in my head than threads, where I have no idea of what's executing where at any given time.

The major change is conceptually quite simple: replace the official (and synchronous) Kubernetes Python client with kubernetes_asyncio. That in turn would ripple throughout the codebase removing the need for the asynchronize method, and since I was using asyncio anyway, I replaced the Tornado gen.timeouts with asyncio.wait_for.

That, by itself, is the vast majority of the technical changes. My first cut at this was a lot more radical in terms of reflector redesign (even at one point abandoning the reflector entirely). This one aims to be much more minimal.

The major structural changes are that reflectors and proxy objects are created now with classmethods, simply because you cannot call async methods from __init__. Unfortunately, I cannot impose that change on how the Spawner object itself is instantiated, so each of its start/stop/poll entrypoints (and indeed the hook when a pre_spawn_hook was called) needed to do reflector and client config initialization.

I do not expect anything to break. After some work, the test suite passes all the cases that pass for me in the standard Kubespawner. I'm using it on my development cluster now with no issues. That said: there are no tests for the Proxy object, and it's gotten exercised a lot less than the Spawner and Pod and Event reflectors. I am confident that in the namespace-per-user case (which is how we use it at Rubin Observatory), this version is functionally equivalent to the threaded version.

Load tests seem to indicate that performance is equivalent to, and maybe slightly better than, the threaded version, but I don't have firm data on that yet. My reported error rate is down significantly, and I think that's because coroutines do a much better job of getting fast-to-create K8s objects (like secrets and configmaps) actually built before something tries to operate on them than a bunch of threads hammering willy-nilly on the control plane did. That, however, is closer to a guess than a well-instrumented theory.

The one change that may be controversial is that it requires a much, much newer Kubernetes client: specifically 19.15.1 (rather than 10.1). The reason for this is simple: #423 / #424 use a little partial-application hack to get the K8s client to NOT deserialize the JSON it receives, and this is necessary to keep CPU consumption reasonable in busy clusters. The asyncio_kubespawner didn't let us do this until I alerted @tomplus to it (tomplus/kubernetes_asyncio#177) ; he committed support for this very quickly, but I'm not going to ask him to backport it to the distant past. The Python 3.7 failure looks likely to be related to an older Kubernetes rather than Python itself.

The major remaining issue is related to tomplus/kubernetes_asyncio#12 , which remains open. In order to retain the ability to execute Python in a pod, because there is no multichannel ws client in kubernetes_asyncio presently, I had to make the synchronous K8s client a test dependency, and patch it into the test suite, while using the async client everywhere I could.

The Kubespawner, and particularly its reflectors, is still a complex and fairly difficult-to-understand piece of code, but I think this change, to make it coroutine-based rather than thread-based, does manage to make it easier (still far from easy) to think about what it's doing.

Refactor away the `create` function
@athornton
Copy link
Contributor Author

Squashing is fine by me. So is not squashing. What would you like? I feel like there were a few big phases, anyway, but I don't really care much about a clean commit history. I'm happy to go with whatever you like.

@consideRatio
Copy link
Member

consideRatio commented Feb 19, 2022

Thank you Adam!

Thank you for this thorough work @athornton!! I love the improved code comprehension this PR has given me by the reduction of complexity I think it leads to.

To squash or not?

@athornton I'm not sure about my understanding. Will the squashing the back and forth commit changes reduce complexity for a person rebasing another PR on this merged PR? Thats the key value I think there is, but I'm not sure it helps. Do you know?

@minrk/@yuvipanda do you have input about squashing this PR or not?

Final review request

@minrk/@yuvipanda could you do a final review pass? I would like another person to hit approve besides me. Note that the PR description is updated to reflect the current state.

@athornton
Copy link
Contributor Author

athornton commented Feb 19, 2022

@consideRatio as you may have noticed, I'm not very good at git, and my background is...not that of a traditional developer. So please don't take my words as representative of what real developers from the modern world want.

I generally would prefer the whole messy commit history. Almost all of the time I don't use it: I just work with the tip of the branch. But when there is something where I want to see how it was changed, then having that history can be helpful. A case in point would be whether we squash around the introduction/renaming/removal of the instantiation classmethod.

In most cases, I don't think it's going to matter. But I can imagine some scenarios in which a developer is confused because things aren't getting initialized. Probably they'll do fine by reading the comments and realizing they need to use the decorator, but I can also see there being value in watching the back-and-forth from sync K8s client to classmethod to renamed classmethod to return-future-and-await-it-on-method-use. Knowing the blind alleys someone else went down sometimes helps me avoid having to explore them myself. But then, as someone who failed to get his Ph.D. in the History of Computing, I can tell you that if I had to summarize the field in one sentence, it's that we've made all the same mistakes in each of the mainframe, mini, micro, and mobile eras because everyone would (understandably!) rather start coding than read the literature.

So I guess my tl;dr is, I don't think squashing would help me, but it may very well help someone who has internalized better development practice. I am happy to do whatever the three of you think is right.

@consideRatio
Copy link
Member

I think what is most important is that you feel good about what is done, not that its merge with or without squash - so I'm 💯 for not squashing!

@manics
Copy link
Member

manics commented Feb 20, 2022

@droctothorpe @remche You've both contributed to KubeIngressProxy in the past year. Would you mind checking this PR hasn't broken anything as we don't have any tests for it?

Copy link
Member

@minrk minrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Thanks for all the effort and review, folks!

Copy link
Collaborator

@yuvipanda yuvipanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to provide feedback late, @athornton - I was on vacation.

I am extremely happy with this change! I do think we can simplify config initialization a bit to not rely on internal details of what child methods JupyterHub calls - and I've left a comment with a suggested approach. What do you think?

methods, if implemented, could be what we need to decorate with
_await_async_init:

- get_all_routes (implemented and decorated)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this feels slightly fragile - it relies on understanding of what JupyterHub's internal code does. Instead, can we make shared_client (https://github.com/jupyterhub/kubespawner/pull/563/files#diff-8c47491cb4dc79ba9e65ffda871633edf9b33db51b66f949844064034ffbb422R30) async, and use a flag to load kubernetes config the very first time any client is requested?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Yuvi for reviewing this!

This is a sound concern but I'm failing to address it with the suggestion as a blob of complexity makes it hard for me to even describe why: but I'll try below.

shared_client is now used from _async_init in KubeSpawner (where self.api is set) and KubeIngressProxy (were self.core_api and self.extension_api is set). load_config is awaited just before calling shared_client, so from that perspective it's not an issue of making shared_client a async getter, but doing so would not resolve the situation. We still need _async_init to finish for various other methods, because they reference self.api / self.core_api - so we need to ensure those won't run before we have initialized self.api.

Ideas considered

  • Idea 1
    To ensure that when we need to use an api client, we don't rely on something set on the instance like self.api, but request it fresh from shared_client.

    • Ugly detail: k8s_api_host traitlet's value for example is needed by load_config, which now is to be part of shared_client's logic. This traitlet's value is not available to the reflector classes even though they are part of KubeSpawner/KubeIngressProxy. This is likely addressable, but I find it to be a bit tricky on passing the relevant information onwards to the reflector and I'm not sure how to do it in a relatively clean way.
    • Ugly detail: requesting the api client on demand, also now async, means we need to await the client first. Following that we also await some some function on it. This is not a technical problem, just bloats the code a bit.
  • Idea 2
    We ensure that whenever we want to use self.api etc from a function we have defined, we do it from a function decorated with _await_async_init. That way, we have control within this project when it may be needed. This idea benefits from the overview below about where things are used.

  • Idea 3
    We double down on the current solution, where we have relied on understanding JupyterHub's Spawner/Proxy base classes which is the interface JupyterHub use to work against our code in this project. While we rely on JupyterHub's internal working, we rely on the public interface we expose to JupyterHub by being a Spawner class (and Proxy class).

    I think this could lead to some documentation improvements in JupyterHub. I believe an overview of all interactions JupyterHub has with the Spawner / Proxy class can be improved to for example cover all hooks and functions in a single page for situations like this.

Where are things used?

shared_client is used from...

spawner.py:

  • _async_init

proxy.py:

  • _async_init

reflector.py:

  • __init__ (load_config is assumed to have run here)

The api clients are used from...

self.api in spawner.py is used from these functions:

  • _async_init (initialized)
  • _make_create_pod_request
  • _make_create_pvc_request
  • _ensure_not_exists
  • _make_create_resource_request
  • _make_delete_pod_request
  • _make_delete_pvc_request
  • _ensure_namespace

self.core_api and self.extension_api in proxy.py is used from these functions:

  • _async_init (initialized)
  • add_route
  • delete_route

self.api in reflector.py is used from these functions:

  • __init__ (initialized)
  • _list_and_update
  • _watch_and_update

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as @consideRatio ratio said, there are two loosely-coupled problems here.

  1. shared_client needs to know the traitlet values in order to mess with the TLS settings
  2. The client has to be configured before use, but we don't know what method will be first used and we don't know that it even will be one of the documented methods rather than something happening in a hook.

It feels like there's something we can do with making a SharedClient mixin class that the Spawner and Proxy both inherit...

...but I think maybe the thing to do is go with what we have now and put that into the whole bundle of investigation about how much TLS setup really costs. Conceptually it would be a lot cleaner to NOT share the client and treat it as an async context manager, which is how the underlying library wants you to use it. I did not go with this approach because while I'm sure it's going to look fine in the test cases, I really want to try that out at n=500 or so in order to see whether it scales.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...and even if it's a mixin there's still the problem that sooner or later you're going to have to await() the configuration one way or another, and thus that can't happen from __init__(). I think I am in favor of changing the calling conventions from JupyterHub to do something like call a create() classmethod, if such exists, instead of just instantiating a new instance of the class, but I don't know how feasible asking for that upstream would be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@minrk would it be reasonable to make JupyterHub look for if the Spawner / Proxy class provides a async constructor, and if so, make JupyterHub await it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, one with a well-known name (I like create()). The logic would look like:

for a thing with a specified class (e.g. Spawner, Proxy, dunno if there's anything else):

Does that class have an async classmethod named create()? If so, the instance of the class is created with await kls.create(); if not it's just kls() (the current behavior)

Copy link
Member

@consideRatio consideRatio Feb 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I inspected how JupyterHub interacted with an initialized Spawner object, and it seems its quite hard to overview. I think the right call is to go with Idea 2 - to decorate all the methods in KubeSpawner we know will end up needing to await the _async_init call that initialized the shared_client ahead of time before it is used.

JupyterHub interactions with Spawner class instances

The _new_spawner function is used as a (sync) factory function.

It is used only once, [in the (sync) __init__ function of the User class, which has a _SpawnerDict class instance create a Spawner class whenever the _SpawnerDict instance, now available in the user instance spawners property is referenced. For example, via my_user.spawners["any_key_here"] - then a Spawner instance is returned in a sync manner.

Since JupyterHub inits Spawner instances from sync functions, and sync properties indirectly, it seems like a relatively major change to rely on a async factory function.

Observed functions used by User object

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@minrk @yuvipanda @athornton I'm really struggling to get somewhere. I'm running low on steam =/ I've run into test failures in https://github.com/lsst-sqre/kubespawner/pull/13.

The clean way of ensuring we await the async init is simply to do it from JupyterHub with a custom async constructor I think... I looked into the JupyterHub code base about that recently, and it seems quite tricky as well. See jupyterhub/jupyterhub#3801 PR for comments about it.

Comment on lines -141 to -152
try:
kubernetes.config.load_incluster_config()
except kubernetes.config.ConfigException:
kubernetes.config.load_kube_config()
if self.k8s_api_ssl_ca_cert:
global_conf = client.Configuration.get_default_copy()
global_conf.ssl_ca_cert = self.k8s_api_ssl_ca_cert
client.Configuration.set_default(global_conf)
if self.k8s_api_host:
global_conf = client.Configuration.get_default_copy()
global_conf.host = self.k8s_api_host
client.Configuration.set_default(global_conf)
Copy link
Member

@consideRatio consideRatio Feb 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused about this section in general, and I'm looking to learn more in case it opens up a possibility to refactor this smarter.

  1. Q: What is the difference between load_incluster_config and load_kube_config?
    A: load_incluser_config is to use the k8s service account available to a pod where KubeSpawner may be running, while the other is to rely on a typical computer's configuration also used for kubectl.

    load_incluser_config: Use the service account kubernetes gives to pods to connect to kubernetes cluster. It’s intended for clients that expect to be running inside a pod running on kubernetes. It will raise an exception if called from a process not running in a kubernetes environment.
    load_kube_config: Loads authentication and cluster information from kube-config file and stores them in kubernetes.client.configuration.

  2. Q: Do we really need load_kube_config at all?
    A: I would guess yes. So this is about KubeSpawner's communication to the k8s api-server. KubeSpawner may be running in a k8s pod with a mounted k8s ServiceAccount, and then load_incluster_config is helpful. But also, if it doesn't exist, loading local kubernetes client configuration would be helpful - just as the kubectl tool would do.
  3. Q: Why is load_kube_config async in kubernetes_asyncio?
    A: Because it may reference user credentials to be used, but not declared explicitly there. See the _load_authentication function
  4. Q: Do we need to have the config loading happen before we do the latter part, or can we switch the ordering?
    A: Probably thinks both Adam and I.

Copy link
Contributor Author

@athornton athornton Feb 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_incluster_config is what JupyterHub will use. It relies on the pod's serviceaccount credentials, automatically mounted. load_kube_config is what is used if you're using a .kubeconfig file, and is how running the test via pytest works. EDIT our normal use case is that these are classes instantiated and used within the JupyterHub process, which is why we prefer load_incluster_config. However, it should be possible, if we really care, to mock out load_incluster_config for the testing environment, and try the real load_incluster_config and only if that fails fall back on load_kube_config. This feels like a fair amount of effort for dubious gain to me.

I think we have to set the TLS stuff after loading the configuration, because I believe that loading the config overwrites all configuration settings, but I have not verified that. EDIT and now I have; it does, so I don't think at present we can do much better than this.

Copy link
Contributor Author

@athornton athornton Feb 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't encouraging from upstream (granted, the sync client, but even so...):

kubernetes-client/python#1005

like, everyone agrees it'd be nice to have a method that does the fallback, but no one has actually put it into the base client yet.

EDIT: huh, looks like load_config was merged, but no one has updated the docs. Let's see if it is in kubernetes_asyncio...it is...and we do indeed have to reset the cert information afterwards. https://github.com/tomplus/kubernetes_asyncio/blob/c88b3b9c78a9388ef3af00858b8c516b84b0bf30/kubernetes_asyncio/config/kube_config.py#L347 ... MORE EDIT, nope, load_config only exists inside the KubeConfig loader; we still need the try/execpt.

@athornton
Copy link
Contributor Author

jupyterhub/jupyterhub#3801 is a draft of what I mean (untested as of yet, fresh off the neurons).

@athornton
Copy link
Contributor Author

Given that this is turning out to be really gross, should we just go with what we have and address a cleaner solution in 3.1 or something?

Copy link
Collaborator

@yuvipanda yuvipanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, this ended up being far more complicated than I had imagined. THANK YOU for explaining and digging into it, @athornton and @consideRatio. Given the complexity I agree we don't need to make this PR wait on that to be resolved - the current solution is a pretty cool hack I deeply respect too (although we should remove it).

@yuvipanda yuvipanda merged commit 4743a22 into jupyterhub:main Feb 22, 2022
@minrk
Copy link
Member

minrk commented Feb 22, 2022

🎉 🎆 👏 yay!

If it's really just the async-config-loading that's causing the biggest problem, we can make it sync by running it in a thread. It's a bit silly for that to be async, anyway.

with ThreadPoolExecutor(1) as pool:
    return pool.submit(lambda: asyncio.run(async_load_config())).result()

@athornton
Copy link
Contributor Author

I mean that does have the effect of condensing much of the complexity into a single terrifying line....

@athornton
Copy link
Contributor Author

Huge thanks to all of you for your encouragement, persistence, and patience!

@consideRatio
Copy link
Member

consideRatio commented Feb 22, 2022

Thanks @athornton @minrk and @yuvipanda! I'm very relieved this is merged!


If it's really just the async-config-loading that's causing the biggest problem, we can make it sync by running it in a thread. It's a bit silly for that to be async, anyway.

with ThreadPoolExecutor(1) as pool:
    return pool.submit(lambda: asyncio.run(async_load_config())).result()

I don't understand this @minrk - are you saying that it would help us ensure a strategy to make some async init logic would complete before anything else was done on the KubeSpawner instance after its __init__ was called?

  1. __init__ is called that is supposed to set self.api etc referenced by other functions
  2. Other functions are called referencing self.api

Our current workaround is to schedule an _async_init function as __init__ runs, and then ensure its awaited whenever other functions are called. Would that been possible to avoid with your example?

@yuvipanda
Copy link
Collaborator

@minrk I 100% agree it's silly for it to be async! I like your solution :D

@consideRatio
Copy link
Member

consideRatio commented Feb 22, 2022

@minrk I 100% agree it's silly for it to be async! I like your solution :D

I was thinking it was silly as well, which made me track down this which is the reason load_kube_config has to be async - because this _load_authentication is async - which it seems to need to be because of credential helpers may need to do slow work. Now I'm just troubled that we have a load of work related to it - and it won't even matter for anyone using kubespawner from a k8s Pod with a mounted ServiceAccount as we do in the JupyterHub Helm chart.

From #563 (comment).

  • Q: Why is load_kube_config async in kubernetes_asyncio?
    A: Because it may reference user credentials to be used, but not declared explicitly there. See the _load_authentication function

I've added a comment about this in #573 with some followup changes to this PR. I'd love if we can avoid decorating functions without being able to decide easily what needs to be decorated.

@minrk
Copy link
Member

minrk commented Feb 23, 2022

are you saying that it would help us ensure a strategy to make some async init logic would complete before anything else was done on the KubeSpawner instance after its init was called?

I'm only thinking of the load-config part, not in general for all possible async_init. Just noting that just because there is any async call doesn't mean we need to make everything that calls it also async - it's possible to make async calls blocking if that simplifies things.

Assuming a coroutine completes (i.e. doesn't start anything long-running like the reflectors), you can run it in a blocking way with asyncio.run(coroutine()). asyncio.run does:

  1. create new event loop
  2. run coroutine until it's done
  3. completely clean up event loop
  4. return coroutine's result

Since there can only be one loop running at a time per thread, you can 't call asyncio.run() from within a running loop (nest_asyncio patches asyncio to allow this). But you can always make an async call a sync call regardless of the current thread's loop by using a one-off thread to make the blocking call in another thread, and then cleanup the thread immediately.

To breakdown the steps in the above snippet:

with ThreadPoolExecutor(1) as pool: # create a single thread
    # function to call the coroutine in a blocking way, return the result
    sync_load_config = lambda: asyncio.run(async_load_config()) 
    # schedule blocking call on the thread, returning a concurrent.futures.Future
    future = pool.submit(sync_load_config)
    # block the _current_ thread waiting for the result and return it
    return future.result()
# exiting the context terminates the thread

This is basically the inverse of how we used threads to make blocking calls awaitable. The difference:

  1. the background thread doesn't stay running, it's destroyed immediately
  2. we block on the result with future.result() instead of making it awaitable with asyncio.wrap_future()

This approach is only valid if the coroutine being called doesn't store anything thread-local (extremely rare) or associate state with the running event loop. It wouldn't be appropriate for setting up reflectors, but it does mean we can make shared_client blocking and always return configured clients without needing any awaits.

@consideRatio
Copy link
Member

❤️ 🎉 ahhhhhhh!

So what about the solution of...

  1. Make load_kube_config for out-of-cluster kubespawner's blocking with the trick described above
  2. Ensure we only have to do that once load_kube_config once with some boolean

If we do that, I think we are good to go and remove all _async_init logic across the board!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants