Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU Usage #2597

Closed
ozanmakes opened this issue Nov 25, 2019 · 14 comments
Closed

High CPU Usage #2597

ozanmakes opened this issue Nov 25, 2019 · 14 comments
Assignees
Labels
bug Something isn't working

Comments

@ozanmakes
Copy link

ozanmakes commented Nov 25, 2019

I'm getting started Tilt and bumped into a high CPU usage scenario I can reproduce using Gloo helm chart. This consistently gets Tilt to use at least 40% CPU on my 8 core laptop. I'm using Tilt v0.10.21 with k3d on Manjaro Linux.

I'm new to k8s so please let me know if you need more details, but I'm hoping these steps will help you reproduce the issue:

Tiltfile:

k8s_yaml(helm('gloo', name='gloo-gateway', namespace='gloo-system'))
k8s_yaml(helm('gloo', name='gloo-knative', namespace='gloo-system', values=['./gloo/values-knative.yaml']))
$ helm repo add gloo https://storage.googleapis.com/solo-public-helm
$ helm fetch gloo/gloo
$ k3d create cluster --image rancher/k3s:v0.10.2-amd64
$ sleep 10
$ export KUBECONFIG="$(k3d get-kubeconfig --name='k3s-default')"
$ tilt up

In the first run, this chart seems to log a lot of diagnostic errors, but even if I stop tilt & run tilt up again it quickly goes back to using a lot of resources even though I don't see much output on the console.

@nicks
Copy link
Member

nicks commented Nov 25, 2019

Thanks for the report! This sounds pretty bad, @maiamcc are you able to repro?

@nicks nicks added the bug Something isn't working label Nov 25, 2019
@maiamcc
Copy link
Contributor

maiamcc commented Nov 25, 2019

Oh eek, yep!
Screen Shot 2019-11-25 at 5 21 06 PM

I had trouble getting k3d running but your Tiltfile+chart gives me plenty of Tilt-eating-my-CPU on K8s for Docker for Mac, I'll start looking into it over here.

Just to verify I'm not debugging the wrong problem, can you profile your Tilt run for me? See profiling instructions here, and then share the tilt.profile file you create. Thanks!

@ozanmakes
Copy link
Author

That seems to be along the lines of what I saw. I’ll gather some profiling data tomorrow when I’m back on my laptop, thanks for the quick response!

@maiamcc
Copy link
Contributor

maiamcc commented Nov 25, 2019

After some digging, looks like the biggest culprit here is our terminal UI. Try running with tilt up --hud=false -- when I tried, it reduced my CPU usage by almost half. (If your CPU usage is still too high, let us know and we'll dig deeper.)

@ozanmakes
Copy link
Author

image

With --hud=false I'm still seeing high CPU usage (not as high as tilt up which shows up as between 300% 400% on htop) , and Tilt browser tab seems quite resource hungry as well. Doesn't get much better when I kill tilt & re-run the same command with the existing cluster, it quickly works its way back to the same CPU usage.

I created a quick profile when tilt is seemingly idle, which confirms your observation: tilt.profile.zip, but I'd also like to profile it with HUD disabled. Do you have instructions for that?

@ozanmakes
Copy link
Author

ozanmakes commented Nov 26, 2019

I observed something new. When I run tilt up --hud=false with this setup, press enter to open the web view, and then close it, Tilt panics with this output:

panic: runtime error: invalid memory address or nil pointer dereference
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x1ab1c76]

goroutine 254646 [running]:
github.com/windmilleng/tilt/internal/hud/server.WebsocketSubscriber.OnChange.func1(0x0, 0x0, 0x267e1e0, 0xc0003a36b0)
	/Users/matt/go/src/github.com/windmilleng/tilt/internal/hud/server/websocket.go:91 +0x26
panic(0x1f47d40, 0x3a9e8c0)
	/usr/local/Cellar/go/1.13.3/libexec/src/runtime/panic.go:679 +0x1b2
github.com/windmilleng/tilt/vendor/github.com/golang/protobuf/jsonpb.(*errWriter).write(...)
	/Users/matt/go/src/github.com/windmilleng/tilt/vendor/github.com/golang/protobuf/jsonpb/jsonpb.go:1113
github.com/windmilleng/tilt/vendor/github.com/golang/protobuf/jsonpb.(*Marshaler).marshalObject(0xc00098d050, 0xc0016dd600, 0x265b1a0, 0xc0013c80c0, 0x0, 0x0, 0x0, 0x0, 0x203000, 0xc000b31680)
	/Users/matt/go/src/github.com/windmilleng/tilt/vendor/github.com/golang/protobuf/jsonpb/jsonpb.go:269 +0x142d
github.com/windmilleng/tilt/vendor/github.com/golang/protobuf/jsonpb.(*Marshaler).Marshal(0xc00098d050, 0x0, 0x0, 0x265b1a0, 0xc0013c80c0, 0xc000939201, 0x5e)
	/Users/matt/go/src/github.com/windmilleng/tilt/vendor/github.com/golang/protobuf/jsonpb/jsonpb.go:138 +0x1fa
github.com/windmilleng/tilt/vendor/github.com/grpc-ecosystem/grpc-gateway/runtime.(*JSONPb).marshalTo(0xc00098d050, 0x0, 0x0, 0x21fcf60, 0xc0013c80c0, 0xc0016dd6d8, 0x40cc48)
	/Users/matt/go/src/github.com/windmilleng/tilt/vendor/github.com/grpc-ecosystem/grpc-gateway/runtime/marshal_jsonpb.go:50 +0x10d
github.com/windmilleng/tilt/vendor/github.com/grpc-ecosystem/grpc-gateway/runtime.(*JSONPb).NewEncoder.func1(0x21fcf60, 0xc0013c80c0, 0x2041220, 0xc000939220)
	/Users/matt/go/src/github.com/windmilleng/tilt/vendor/github.com/grpc-ecosystem/grpc-gateway/runtime/marshal_jsonpb.go:155 +0x5e
github.com/windmilleng/tilt/vendor/github.com/grpc-ecosystem/grpc-gateway/runtime.EncoderFunc.Encode(0xc000939220, 0x21fcf60, 0xc0013c80c0, 0x260ade0, 0xc000939220)
	/Users/matt/go/src/github.com/windmilleng/tilt/vendor/github.com/grpc-ecosystem/grpc-gateway/runtime/marshaler.go:42 +0x3a
github.com/windmilleng/tilt/internal/hud/server.WebsocketSubscriber.OnChange(0x267e8e0, 0xc000e47e40, 0xc0007e6480, 0x267e1e0, 0xc0003a36b0, 0x265ac20, 0xc0002a4050)
	/Users/matt/go/src/github.com/windmilleng/tilt/internal/hud/server/websocket.go:97 +0x236
github.com/windmilleng/tilt/internal/store.(*subscriberEntry).notify(0xc000938f80, 0x267e1e0, 0xc0003a36b0, 0xc0002a4050)
	/Users/matt/go/src/github.com/windmilleng/tilt/internal/store/subscriber.go:122 +0x10b
created by github.com/windmilleng/tilt/internal/store.(*subscriberList).NotifyAll
	/Users/matt/go/src/github.com/windmilleng/tilt/internal/store/subscriber.go:103 +0x126

edit: happens with HUD enabled as well

edit: latest version fixes the segfault as described, but the CPU usage stays high. Disabling the hud & closing the browser tab both help reducing the resource consumption, but baseline stays quite high (around 170% on htop with not much action on the terminal)

@maiamcc
Copy link
Contributor

maiamcc commented Nov 26, 2019

Thanks for your profile, I'll take a look!

I'd also like to profile it with HUD disabled. Do you have instructions for that?

Unfortunately we're not well set up for profiling when the HUD is disabled. For now, you can use this branch, which waits 15s, then kicks off a 10s profile. Are you comfortable building from source? If not, I can whip you up a binary.

@ozanmakes
Copy link
Author

Here are some runs in a single session: tilt.profile.nohud.zip. I hope this is helpful!

@maiamcc
Copy link
Contributor

maiamcc commented Nov 26, 2019

Thanks, these are great!

I think I was mostly barking up the wrong tree -- found a small perf gain that I made a PR for in case, but it probably isn't enough to actually fix your CPU woes. We're all headed for Thanksgiving but I'll take another swing at this on Monday. Thanks for your patience!

@ozanmakes
Copy link
Author

Exploring other avenues could still be worthwhile. Both HUD and websockets seem to affect resource usage, as well as the frontend js.

I’d also be happy to try and work around this issue if you have any insight regarding what makes this particular chart expensive to use with Tilt.

Happy holidays!

@maiamcc
Copy link
Contributor

maiamcc commented Dec 6, 2019

Hi @osener, sorry for the radio silence, but I think the PR I just merged will help. Whenever you get a chance, install from master and let us know how it goes.

There's more context in the PR, but tldr, CPU usage is particularly high for this chart b/c of the number of logs it spews -- roughly speaking, each log event triggers a refresh of the UI (both terminal and web). Terminal UI refresh is expensive because of terminal rendering junk (which is why --hud=false may still be useful even after the latest PR), and web UI refresh is expensive because of lots of json marshaling (both in terms of CPU and memory alloc).

@maiamcc
Copy link
Contributor

maiamcc commented Jan 3, 2020

Hey @osener and @drubin, wanted to follow up -- is your cpu usage any better? (The terminal UI continues to be a CPU hog, but with tilt up --hud=false you ought to see much more reasonable CPU usage.)

@drubin
Copy link

drubin commented Jan 5, 2020

@maiamcc thanks for following up on this. After some seriously annoying debugging I have come to the conclusion it's not purely Tilt's fault it's due to Docker-for-Mac's implementation and how they handle k8s. There are countless reports about the incredibly poor performance of k8s with docker-for-mac.

While I don't think this is purely tilt's responsibility to fix or even that the process using the CPU is tilt I suspect most of my ranting and team complaints are because "when they use tilt their CPU goes through the roof". Regardless of the process causing the high CPU.

Suggestions

To help with profiling and fixing this perceived CPU intensive load.

  1. Focus more on the way tilt uses k8s under the hood and focus on things that are high CPU intensive for k8s (not purely tilt)
  2. Look at below suggestions to improve performance on docker-for-desktop (it's still the easiest way and easiest way to get k8s running on mac/windows)
  3. Look into alternatives to docker-for-mac, blessed workflow for k3s, kind, minikube, or possibly others k3s CPU link
  4. Supporting injecting source code into pre-built containers. This would reduce a bunch of extra CPU not in building/compiling docker (this doesn't directly fix the CPU issues of tilt but it goes a long way to reduce the overall developer flow CPU)
  5. Supporting remote cluster building/running (longer-term goal)

Extra links for high CPU

  1. High idle CPU usage when enabling Kubernetes docker/for-mac#3065 docker-for-mac high CPU bug report
  2. High CPU usage on host docker/for-mac#3539 (comment) this hints it's related to IO related and the bad handling of mapping between osx and the parent vm, which is similar to past experience with fixing volume mounts on OSX to docker.
  3. High CPU Utilization of Hyperkit in Mac docker/for-mac#1759 (comment)

Some ideas I am going to play around with

  1. Reduce the volume mounts to zero, try to reduce IO (I assume we don't need to mount any source into the running containers as this is handled by tilt)
  2. Remove the docker directories from OSX's spotlight indexing (which would also make things slower)
  3. See about other ways to improve docker-for-mac's CPU usage independently of tilt or k8s
  4. Investigate lowering the polling/check intervals of the controllers running locally which should help with etcd usage.
  5. Profile the vm inside of the hyperkit vm. screen ~/Library/Containers/com.docker.docker/Data/vms/0/tty

@maiamcc
Copy link
Contributor

maiamcc commented Jan 7, 2020

Thanks for this super detailed breakdown @drubin! In the medium term, we're definitely looking into ways to both improve Tilt's CPU usage, and have Tilt nudge users towards more CPU-efficient workflows even if it's not Tilt itself eating their CPU.

I suspect the first thing we'll do in this area is have some way to warn users what's eating their memory/CPU (so, if it's not Tilt, they don't blame us 😅 but also so they can actually debug the problem!) and go from there.

Anyway, I'm going to close this ticket on the assumption that @osener can now run this particular chart without prohibitive CPU use, cuz I can--but @osener definitely let me know if it's still giving you trouble as of the latest Tilt release and we'll reopen this investigation.

Thank you both!

@maiamcc maiamcc closed this as completed Jan 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants