-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High proxy memory consumption in high concurrency when h2-upgrade disabled #5146
Comments
|
In order to test whether this leak is triggered by something in the profile discovery, I put together a simple TCP echo test. I set it up with a similar load profile to the wrk2 test (128 client connections to 30 unique services):
Memory remains stable <20MB
This gives us a few signals: It's probably not related to service discovery, protocol detection, or load balancing. The core implementations of these features are used across both the TCP and HTTP stacks. The access patterns are different--we're only transporting ~600 connections per second, as opposed to ~4000 HTTP requests per second (in the wrk2 case), but this may indicate that the issue is caused by something specific to the HTTP stack. To test these assumptions, we could reduce the wrk2 RPS to ~600 to match the TCP use case's load--if the problem can still be observed at this lesser load, it would confirm this hunch. If this problem can only be observed at higher loads, however, we might want to look more closely at our behavior when buffers fill up. Anecdotally, while I was testing yesterday, I did observe that the buffer in the logical stack reached capacity at times... |
Testing wrk2 at lower volumes, a leak is still apparent:
In both cases, the memory growth is steady throughout the run. This points strongly to an issue triggered by HTTP-specific logic. |
@olix0r is that with the proxy on master, or a different build? |
@hawkw Using the last edge release. |
I've written a load tester in Rust, :; helm install ort ./ort --namespace ort --create-namespace \
--set load.flags.concurrencyLimit=$c \
--set load.flags.requestsPerClient=10000 \
--set load.flags.requestLimit=4000 \
--set load.linkerd.inject=enabled \
--set load.threads=$t \
--set server.linkerd.inject=$s \
--set server.services=30 In this case, I'm running k3d with the latest linkerd edge-20.11.1. This configuration approximates the
The results of this tests don't exactly match the original issue description (probably due to
In all cases, the proxy's memory climbs fairly slowly and eventually ~stabilizes. It is possible It's also worth noting that H2 isn't a pure win over HTTP/1 -- we appear to use 10%+ more CPU for When the server is meshed, its proxies each consume <20MB. The gRPC client proxies are stable at <30MB in all cases while the HTTP proxies grow according to While this doesn't exactly replicate the original described behavior, it gives us some strong |
Turning the concurrency up even more, the problems become more pronounced:
The http proxy's memory shoots up in excess of 750MB, while the h2 client stays stable:
This, again, points to something in the HTTP client stack, or potentially the outbound connection stack. Comparing the proxies the number of open fds, we notice immediately that the http proxy has many more connections that we'd expect:
The outbound client is maintaining 700+ connections to most services
This probably points to the hyper connection pool and connection reuse. I'm not sure if this is necessarily wrong behavior: in order to serve concurrent requests, the proxy needs to spin up new connections. It would probably be awkward to enforce a maximum concurrency; but we might consider decreasing the idle timeout... this is basically a trade-off between latency and memory cost. The current caching behavior is probably especially costly for applications like Prometheus that connect to many endpoints, though, especially in Prometheus's case, the latency hit of establishing new connections is acceptable. In fact, it seems preferable to be more conservative about caching connections to unmeshed endpoints, in general. We assume the common case is th communication is fully meshed and multiplexed and the cost of maintaining a cached connection is fairly low and we can more aggressively cache connections (though, still, for meshed prometheus scrapes, this could be costly). Hyper offers two pool configuration options that we should set:
With regard to Prometheus instances, we may want to expose the proxy's cache duration as an externally configurable flag, but this warrants some further thought. |
linkerd/linkerd2-proxy#734 makes the proposed changes to the http/1.1 client |
Running with the modified connection pool we see the above workload that ran in 700MBs down to ~155MB, though it appears that the proxy now uses substantially more CPU... With concurrency at 128, where, on main, the proxy runs at ~150MB, the new configuration runs at ~35MB (again, with elevated CPU). I'm inclined to go forward with these changes... |
After further testing, it seems the best approach is to leave the number of idle connections unbounded, but to limit the cache idle age. The current default is actually 60s, not 10s. linkerd/linkerd2-proxy@c470643 changes these defaults so that the inbound timeout is 10s and the outbound timeout is 3s. This will cause additional discovery lookups when traffic is sporadic, but it should dramatically reduce the memory overhead for idle & sporadic connections. The change warrants a bit more testing and thought, so I don't think we should rush it into stable-2.9.0, but it seems like a good candidate for a followup patch release. |
@olix0r do you think we should consider having different idle ages for HTTP/1 and HTTP/2 services, in order to limit the increased discovery load when idling out services is less expensive? I suppose that if we wanted to do that, it would be a pretty big change, since there's currently One Big Cache that everything lives in...may not be worth it. |
@hawkw Thinking about the Prometheus case, I think we actually really do want to limit the amount of time idle connections stay around for h2 as well. We could potentially have separate idle timeouts for the services and the connections if we see problems, but I think we should be more eager about evicting idle outbound things... |
Hmm, yeah, that makes sense; I suppose for pretty much any outbound that's hot enough that we do want it in the cave, 10s is plenty! |
Addressed by #5200 |
Bug Report
What is the issue?
When disabling H2 upgrading in the proxy, high concurrency scenarios connecting to lots of different endpoints consumes a lot of memory.
How can it be reproduced?
vote-bot
web-svc
is injectedemojivoto-%num
)wrk2
to hit the 30web-svc
instanceslinkerd-proxy
container in thewrk2-prometheus
pod grow:In this particular example using KinD the memory consumption stayed right under the 250Mi limit, but in the Equinix benchmarking boxes (using k8s lokomotive) the pod gets OOMKilled.
When running with h2-upgrade enabled the consumption only goes up to 35Mi
Logs, error output, etc
In the Equinix box where the pod gets OOMKilled:
Environment
Local environment (high mem consumption but no OOMKill)
Equinix box (OOMKilled)
The text was updated successfully, but these errors were encountered: