-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Breaking change / performance: don't make kubernetes-client deserialize k8s events into objects #424
Conversation
Turning API responses into Python objects take a significant amount of CPU time when dealing with a large number of events. Fixes: jupyterhub#423
Thanks for submitting your first pull request! You are awesome! 🤗 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK mostly some smaller nits and docs related things to update since the values are changing from objects to dicts.
The bigger concern is the impact to out of tree reflectors but I don't have a good grasp on the contract there, or what kind of communication / signalling is appropriate for people, i.e. release notes, forum thread, etc? I think the performance gain is definitely worth the change and (minimal) impact to out of tree reflectors, but it's something that needs to be considered. If we can wrap the dicts in simple objects as a facade maybe that would alleviate some of that concern? It would also actually cut down on the size of this change since you might not need to make any changes to the PodReflector or EventReflector code.
self.namespace, | ||
label_selector=self.label_selector, | ||
field_selector=self.field_selector, | ||
_request_timeout=self.request_timeout, | ||
) | ||
# This is an atomic operation on the dictionary! | ||
self.resources = {p.metadata.name: p for p in initial_resources.items} | ||
initial_resources = json.loads(initial_resources.read()) | ||
self.resources = {p["metadata"]["name"]: p for p in initial_resources["items"]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically this changes the value on the resources
dict from an object to a dict so the help here should probably be updated as well:
One thing I'm not sure about is how much this might impact out-of-tree reflectors (assuming those exist, which I don't know if that's supported) that are expecting self.resources
to return objects rather than dicts, i.e. this could be a breaking change for them. One could maybe mitigate that by wrapping the dicts in simple objects that pass through __getattr__
to __getitem__
but as we can see below with resource_version
we'd also have to camel-case-ify some of the attributes, like this.
I'm not sure what the ABI contract on something like this is within the KubeSpawner. Would a release note be sufficient?
@@ -106,7 +106,7 @@ def events(self): | |||
# timestamp without and the other is a higher resolution timestamp. | |||
return sorted( | |||
self.resources.values(), | |||
key=lambda event: event.last_timestamp or event.event_time, | |||
key=lambda event: event["lastTimestamp"] or event["eventTime"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good example of the kind of issue I'm talking about above. Handling this for the known event and pod reflectors within this repo is sufficient but I worry about breaking out of tree reflectors, but like I said I also don't know what contract is there.
all([cs.ready for cs in pod.status.container_statuses]) | ||
pod["status"]["phase"] == 'Running' and | ||
pod["status"]["podIP"] is not None and | ||
"deletionTimestamp" not in pod["metadata"] and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, maybe this should be:
not pod["metadata"].get("deletionTimestamp")
Because deletionTimestamp
not being in the dict vs being in the dict with a None value are different things, unless that's not possible with how the kube API works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deletionTimestamp
is only set by the server when a deletion is requested. It shouldn't exist at any other time.
By the way, I'm assuming you can have out of tree reflectors but I'm not entirely sure if you can, especially looking at the hard-coding here. The KubeSpawner docs don't mention anything about entry points to load your own reflectors. I'm just used to lots of things in jupyterhub being extensions so I made an assumption which is looking false. |
Also fixed issue with how container statuses were being accessed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me now assuming there is no support for out of tree reflectors which it's looking like there isn't. My other comments were addressed. We're already running with this in our pre-production environment because of the performance benefit at high load. Nice work!
Thank you @rmoe for your work on this PR and @mriedem for your review work! ❤️ 🎉! Notification / Question@clkao this PR will impact the users of My conclusionOverall, I think this PR makes a lot of sense given the excellent report in #423. I think we should accept it, verify the breaking changes and report them in a release which needs to be minor rather than patch. Breaking changesThis is what I pick up to be breaking changes.
|
FWIW I agree.
Yup, the docstring for the I mentioned that we could try to still return an object with some massaging of the field names but that could get complicated. We'd probably know if that worked if we didn't have to make some of the changes in this patch, but it's hard to say for sure since there are no tests (that I could find) for the EventReflector code. |
Co-authored-by: Erik Sundell <erik.i.sundell@gmail.com>
That's correct. They were snake case because of how those OpenAPI objects got converted to dictionaries. Now we just have the response from the Kubernetes API as a dictionary and everything the API returns is camel cased. |
@rmoe thank you soo much for your thorough investigation and this PR to resolve the found performance issue! Thank you @mriedem for your review work! ❤️ 💯 🎉 @jupyterhub/jupyterhubteam this LGTM but it would be nice to get another LGTM before I press merge. It solves a real problem when scaling to a very large amount of users so it motivates some breaking changes I think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work! ❤️
\o/ Thanks a lot for this, @rmoe and @mriedem! And thanks for your review, @consideRatio :) |
I was just dealing with an outage caused by slow response times cascading because of high CPU usage... :) |
Primarily to bring in jupyterhub/zero-to-jupyterhub-k8s#1768, which brings in jupyterhub/kubespawner#424 for performance Ref #1746
Turning API responses into Python objects take a significant
amount of CPU time when dealing with a large number of events.
Fixes: #423
Breaking changes
V1Event
objects.V1Event
is defined in the kubernetes-client/python library as a representation of a Kubernetes Event..progress
method implementation (Progress on spawn-pending page jupyterhub#1771) which is generating a formattedmessage
as well as a KubeSpawner specificraw_event
entry now returns theraw_event
as a Python dictionary with entries formatted incamelCase
where the keys were formatted insnake_case
.