Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking change / performance: don't make kubernetes-client deserialize k8s events into objects #424

Merged
merged 3 commits into from
Sep 2, 2020

Conversation

rmoe
Copy link
Contributor

@rmoe rmoe commented Aug 14, 2020

Turning API responses into Python objects take a significant
amount of CPU time when dealing with a large number of events.

Fixes: #423


Breaking changes

  • The Kubernetes EventsReflector, which is providing the KubeSpawner instances with information about Kubernetes Events describing events for other resources, is now exposing events as python dictionaries rather than V1Event objects. V1Event is defined in the kubernetes-client/python library as a representation of a Kubernetes Event.
  • KubeSpawner's .progress method implementation (Progress on spawn-pending page jupyterhub#1771) which is generating a formatted message as well as a KubeSpawner specific raw_event entry now returns the raw_event as a Python dictionary with entries formatted in camelCase where the keys were formatted in snake_case.

Turning API responses into Python objects take a significant
amount of CPU time when dealing with a large number of events.

Fixes: jupyterhub#423
@welcome
Copy link

welcome bot commented Aug 14, 2020

Thanks for submitting your first pull request! You are awesome! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please make sure you followed the pull request template, as this will help us review your contribution more quickly.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also a intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

Copy link
Contributor

@mriedem mriedem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK mostly some smaller nits and docs related things to update since the values are changing from objects to dicts.

The bigger concern is the impact to out of tree reflectors but I don't have a good grasp on the contract there, or what kind of communication / signalling is appropriate for people, i.e. release notes, forum thread, etc? I think the performance gain is definitely worth the change and (minimal) impact to out of tree reflectors, but it's something that needs to be considered. If we can wrap the dicts in simple objects as a facade maybe that would alleviate some of that concern? It would also actually cut down on the size of this change since you might not need to make any changes to the PodReflector or EventReflector code.

self.namespace,
label_selector=self.label_selector,
field_selector=self.field_selector,
_request_timeout=self.request_timeout,
)
# This is an atomic operation on the dictionary!
self.resources = {p.metadata.name: p for p in initial_resources.items}
initial_resources = json.loads(initial_resources.read())
self.resources = {p["metadata"]["name"]: p for p in initial_resources["items"]}
Copy link
Contributor

@mriedem mriedem Aug 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this changes the value on the resources dict from an object to a dict so the help here should probably be updated as well:

https://github.com/jupyterhub/kubespawner/pull/424/files#diff-2f4d4efedf466e7bf4f8f083d0c722adR52

One thing I'm not sure about is how much this might impact out-of-tree reflectors (assuming those exist, which I don't know if that's supported) that are expecting self.resources to return objects rather than dicts, i.e. this could be a breaking change for them. One could maybe mitigate that by wrapping the dicts in simple objects that pass through __getattr__ to __getitem__ but as we can see below with resource_version we'd also have to camel-case-ify some of the attributes, like this.

I'm not sure what the ABI contract on something like this is within the KubeSpawner. Would a release note be sufficient?

@@ -106,7 +106,7 @@ def events(self):
# timestamp without and the other is a higher resolution timestamp.
return sorted(
self.resources.values(),
key=lambda event: event.last_timestamp or event.event_time,
key=lambda event: event["lastTimestamp"] or event["eventTime"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good example of the kind of issue I'm talking about above. Handling this for the known event and pod reflectors within this repo is sufficient but I worry about breaking out of tree reflectors, but like I said I also don't know what contract is there.

all([cs.ready for cs in pod.status.container_statuses])
pod["status"]["phase"] == 'Running' and
pod["status"]["podIP"] is not None and
"deletionTimestamp" not in pod["metadata"] and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe this should be:

not pod["metadata"].get("deletionTimestamp")

Because deletionTimestamp not being in the dict vs being in the dict with a None value are different things, unless that's not possible with how the kube API works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deletionTimestamp is only set by the server when a deletion is requested. It shouldn't exist at any other time.

kubespawner/spawner.py Outdated Show resolved Hide resolved
kubespawner/spawner.py Outdated Show resolved Hide resolved
kubespawner/spawner.py Show resolved Hide resolved
@mriedem
Copy link
Contributor

mriedem commented Aug 14, 2020

The bigger concern is the impact to out of tree reflectors

By the way, I'm assuming you can have out of tree reflectors but I'm not entirely sure if you can, especially looking at the hard-coding here. The KubeSpawner docs don't mention anything about entry points to load your own reflectors.

I'm just used to lots of things in jupyterhub being extensions so I made an assumption which is looking false.

Also fixed issue with how container statuses were being
accessed.
@betatim betatim requested review from consideRatio and minrk August 15, 2020 06:19
Copy link
Contributor

@mriedem mriedem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now assuming there is no support for out of tree reflectors which it's looking like there isn't. My other comments were addressed. We're already running with this in our pre-production environment because of the performance benefit at high load. Nice work!

@consideRatio consideRatio changed the title Don't deserialize Kubernetes API responses Performance: don't make kubernetes-client deserialize k8s events into objects Sep 2, 2020
@consideRatio
Copy link
Member

consideRatio commented Sep 2, 2020

Thank you @rmoe for your work on this PR and @mriedem for your review work! ❤️ 🎉!

Notification / Question

@clkao this PR will impact the users of raw_event returned by progress(). I think the difference will be that you get camelCase instead of snake_case, but still get a python dictionary. @rmoe do you think I described this change correctly?

My conclusion

Overall, I think this PR makes a lot of sense given the excellent report in #423. I think we should accept it, verify the breaking changes and report them in a release which needs to be minor rather than patch.

Breaking changes

This is what I pick up to be breaking changes.

  • Progress() returned raw_event changed from python dictionary formatted with snake_case to a python dictionary formatted with camelCase (I think).
  • A spawner's events property populated by the Singleton events reflector now returns python dictionaries instead of V1Event objects from kubernetes-client/python.

@mriedem
Copy link
Contributor

mriedem commented Sep 2, 2020

FWIW I agree.

Progress() returned raw_event changed from python dictionary formatted with snake_case to a python dictionary formatted with camelCase (I think).

raw_event was a dict because the V1Event object was converted to a dict here. I'm not sure what event.to_dict() does regarding the field names but assuming they were snake-case because that's what they were on the event object and now they are camel-case.

A spawner's events property populated by the Singleton events reflector now returns python dictionaries instead of V1Event objects from kubernetes-client/python.

Yup, the docstring for the events property function was updated to reflect that here.

I mentioned that we could try to still return an object with some massaging of the field names but that could get complicated. We'd probably know if that worked if we didn't have to make some of the changes in this patch, but it's hard to say for sure since there are no tests (that I could find) for the EventReflector code.

Co-authored-by: Erik Sundell <erik.i.sundell@gmail.com>
@rmoe
Copy link
Contributor Author

rmoe commented Sep 2, 2020

@clkao this PR will impact the users of raw_event returned by progress(). I think the difference will be that you get camelCase instead of snake_case, but still get a python dictionary. @rmoe do you think I described this change correctly?

That's correct. They were snake case because of how those OpenAPI objects got converted to dictionaries. Now we just have the response from the Kubernetes API as a dictionary and everything the API returns is camel cased.

@consideRatio consideRatio changed the title Performance: don't make kubernetes-client deserialize k8s events into objects Breaking change / performance: don't make kubernetes-client deserialize k8s events into objects Sep 2, 2020
@consideRatio
Copy link
Member

@rmoe thank you soo much for your thorough investigation and this PR to resolve the found performance issue! Thank you @mriedem for your review work! ❤️ 💯 🎉

@jupyterhub/jupyterhubteam this LGTM but it would be nice to get another LGTM before I press merge. It solves a real problem when scaling to a very large amount of users so it motivates some breaking changes I think.

Copy link
Member

@consideRatio consideRatio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work! ❤️

@yuvipanda yuvipanda merged commit b209ce1 into jupyterhub:master Sep 2, 2020
@welcome
Copy link

welcome bot commented Sep 2, 2020

Congrats on your first merged pull request in this project! 🎉
congrats
Thank you for contributing, we are very proud of you! ❤️

@yuvipanda
Copy link
Collaborator

\o/ Thanks a lot for this, @rmoe and @mriedem!

And thanks for your review, @consideRatio :)

@yuvipanda
Copy link
Collaborator

I was just dealing with an outage caused by slow response times cascading because of high CPU usage... :)

yuvipanda added a commit to yuvipanda/datahub-old-fork that referenced this pull request Sep 2, 2020
yuvipanda added a commit to yuvipanda/datahub-old-fork that referenced this pull request Sep 2, 2020
yuvipanda added a commit to yuvipanda/datahub-old-fork that referenced this pull request Sep 4, 2020
Primarily to bring in
jupyterhub/zero-to-jupyterhub-k8s#1768,
which brings in jupyterhub/kubespawner#424
for performance

Ref #1746
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

High CPU usage in the reflector
4 participants