-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase reload performances #547
Comments
The controller actually reads and parses every single ingress, adding them to a clean model. The first time this happens, haproxy is started using such model and the current model is copied to old. Once an update happens, a new clean model is created, all the current ingress are added and the actual current model is compared to the old one in order to identify if a dynamic update could be done. This works very well on clusters that have up to about 5k ingress and 5k services but apparently doesn't scale very well. The proposal here is to use a single model (or use the old one as a starting point) where ingress isn't just added, but also updated and deleted. I think remove a whole domain and reinclude only cahnged ingress is doable, and should run magnitudes faster than the actual implementation on clusters whose the number of domains is high and the number of paths per domain is not. I'll start this approach and let's see how it scales. New ideas usually arises after starting to code.
No I don't, be my guest! Btw I do follow some of valyala components out there, they have in general a pretty good performance. |
Template time. My guess is that we can split haproxy conf in a couple of files and render+save only the files whose configuration was changed. About +95% of the config on big clusters are backend declaration and they can be splitted into a few tens of files depending on the number of services. |
Dynamic model helps to reduce CPU usage and processing time on clusters with 2k+ ingress and 2k+ services. Clusters with 10k+ ingress and 10k+ services with high cardinality on hostnames should see a big improvement on processing time of ingress objects when using dynamic model. Actually haproxy ingress parses all ingress objects on top of a clean model. The proposal is to copy the previous state and update the model state based on the changes reported by the k8s client. This is not as simple as edit an attribute: there is a conflict analyzer and precedence of annotations in place, and some options might change a configuration value in more than one place. Fortunatelly everything happens below the hostname with a few exceptions detailed below.
Annotations and its conflicts need to be reanalyzed from a clean configuration, and the scope of such configuration is hostname, path and service.
Secret data does not impact other configurations except if an annotation references it.
Endpoints impact only its backend.
Frontend and backend config keys are used as default value for all ingress and services, so all of them should be parsed again. Globals are used to update global haproxy configs and does not (should not) impact other configs (double check).
The pod state is used only in the endpoint list building, and only when drain support is true.
|
Honestly, I'm missing this point: why we should rely on Pod rather than Endpoint resources? We're already listening for Endpoint events and AFAIK Kubernetes reflects all the Pod changes to endpoint list as well according to its status (
I guess the actual lister can be removed to reduce memory consumption for large clusters running more >1k nodes: maybe we should raise an issue and track it down. I'm a bit concerned regarding the design of the reconciliation loop, rather: haproxy-ingress/pkg/controller/controller.go Line 291 in 81765c5
haproxy-ingress/pkg/converters/ingress/ingress.go Lines 84 to 89 in 81765c5
tl;dr; we're not processing just the single enqueued entry but iterating over all the ingresses to render them back. @jcmoraisjr is your in-memory processing feature going to process this way items? Also because we have to keep in mind that serialization of objects is time and resource consuming, we should avoid it at least, also from a memory management perspective. |
Because of the way drain support works. Terminating pods are also added to the backend slots with weight zero so it continue to receive requests from sticky sessions. This could however be changed to a conditional configuration, changed to query the api, etc.
Probably true, my guess is that it should be used in the status updater but it isn't. These (current listers and status updater) are some of the old controller inheritance that I couldn't have a look yet.
Yes that's exactly how it works now. This is a pretty simple design and works very well on small, not-so-small, and also on not so-soooo-big clusters. We does this step on about
No. The proposed redesign will continue to use such iteration, but filtering out hostnames that wasn't touched via ingress or service objects. Eg if you change or remove an ingress or service object that touches |
That's incorrect. I guess the Ingress Controller just needs to behave as the user is expecting to and need: let's dig it up. In the past, Kubernetes was providing the Service annotation When set to If the user is not setting any I started looking at the history of the source file and this logic seems implemented since the early version 1.0 of Kubernetes: if we're going to remove Pod IPs since they're running in a to being drained node although the user is using the Given so, I'm pretty confident that we have to drop the listening on Pods since all the business logic is already handled correctly by Kubernetes itself, also considered that iterating over Pods is a huge waste of CPU cycles, and memory too. |
I think there is a misunderstanding here. Some ideas that come to my mind:
Item 2 above sounds to me as a new functionality currently not supported by k8s. Because of that it sounds an override of the default behavior, and because of that pod lister is being used. I'm completely in favor of optimizations, so I completely agree with you that a pod lister shouldn't be configured if such lister isn't being used, but currently I cannot agree that much that we should remove it altogether because we'd break how drain support is supposed to work. |
Nope, please, take a look at the code here: this is the func executing the reconciliation for the given Endpoint object and it's performed every time, not just for external Load Balancer. So if a Pod is going to be deleted, the Endpoint Controller will start reconciliation and the Endpoint will be updated, dropping the Pod IP: what I'm trying to say is that's already done by Kubernetes. I guess we're using Kuberentes Endpoint resources as HAProxy Backend Server object: given so, if I got an endpoint update because the Pod has been removed from the subset due to its terminating state, I have to update the HAProxy Server, removing its address from the list. Can you confirm this? Or are we directly upstreaming to Kubernetes Service rather than single Kubernetes Endpoints?
Please, can you point me to the code where this should happen? Because I started digging the
Indeed, wondering why you have to try to implement it: Ingress Controller should be responsible only to take external traffic into the cluster. All the network mesh should be handled by CNI and the Endpoint controller: this could break the Single Responsibility Principle.
It's actually working because, according to code, it's already done by Kubernetes and the Endpoint controller. |
Just to be more clear, I'm referring to this snippet haproxy-ingress/pkg/common/ingress/store/main.go Lines 173 to 187 in 6c0e0dd
We're picking dropping the Pod IP in three cases:
All these three cases are handled by the Endpoint Controller: why replicating it into HAProxy putting a weight to zero? These shouldn't get traffic at all because they could be unreachable, on shutdown or any other issue. Also, last but not least, although you mentioned sticky sessions, if my Pod is going to be shut down it should just complete the pending request and not accepting any new one, even if I have a sticky session: that's the Kubernetes pattern. |
No, we are using the endpoint object all the time, except when drain support is enabled which means "add also the terminating pods, define weight zero in haproxy". I'm not using this functionality myself, perhaps @brianloss and @gavinbunney can advocate on behalf of it. As I said before, if drain support isn't being used, pod lister won't be used as well.
Just to be clear: we're not reinventing the endpoint discovery, but instead adding endpoints to the backend which k8s consider dead. This is false by default, of course, and this only makes sense if both sticky session and long terminating status are in place. |
We have some good news, listed in no special order:
All of them combined makes it possible to start something related with model reuse, partial update, clean/dirty hosts and backends. Just cleaning up stuff, no performance improvement here. Besides that, some queries was converted to map, this one on top of the lister/informer refactor reduced our parsing time about 50%. This finishes the phase one. Now starts the real improvements. |
Hi, fyi I've just tested #571 on our production cluster and here is the time comparison of |
Running #623 in production since the last sunday and it looks promising. I'll update this issue again as soon as we have a snapshot tag. |
First v0.11 snapshot is tagged and changelog updated. Thank you for all the help in this version. Let me know about the numbers of v0.11 on your cluster and if you have any problem. Closing. |
This issue aims to track the required improvements to decrease the reload time of the HAProxy instance when managing thousand of Ingress resources, as mentioned in #541 when dealing with ~17k items.
The overall time required to perform a configuration reload is divided into three spans:
parse_ingress
parse_tcp_svc
write_tmpl
reload_haproxy
All these actions, except
reload_haproxy
, are handled by the Ingress Controller and can be optimized further providing refactoring, enhancements or new functionalities.According to these, we can plan and discuss the required improvements to tackle the situation at different levels.
parse_ingress
section using Listers instead ofcache.Store
to parse ingresses (addressed by Converting to cache.Listers #545)write_tmpl
section@jcmoraisjr can you share further details regarding your proof of concept for the dynamic in-memory update?
If you don't mind, I would give a try to optimize further the configuration templating, maybe switching over a third-party package like quicktemplate that looks blazing fast compared to the template manager provided by the standard library.
The text was updated successfully, but these errors were encountered: