-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace podAntiAffinity addon by a topologySpreadConstraints addon? #1537
Comments
Can you share more about sig scalability requirements about this? I cannot find anything much info related to use of affinity vs topologySpreadConstraints. The only thing I found is KEP for introducing the feature and info that affinity and topologySpreadConstraints can work together. From what I see Don't get me wrong, I like the idea of using |
I am at the same point as you on all that. Reading the KEP further, PodAntiAffinity seem to cause them issues and they are even thinking about deprecating it in the linked issue:
From this, it seems to be more about how it works internally, rather than the potential gain in flexibility for the users. My guess is, when the Karpenter documentation refers to the "Kubernetes SIG scalability" recommendation, it is not about the scalability of your deployment, but the scalability of your Kubernetes Scheduler |
I'm not sure I see anything concrete in that issue which states that there will be any deprecation to affinity or anti-affinity. It is an almost 3 year old issue. In fact I see various related enhancements landing in 1.22 kubernetes/enhancements#2249 To be clear, I am not saying we shouldn't consider this or it is a bad idea. I just want to be sure we are not being motivated by the wrong source. |
I got the Karpenter documentation clarified aws/karpenter-provider-aws#948 So in short, pod anti-affinities are a strain for the Kubernetes Scheduler at the moment I have finally found the literal recommendation
Now the questions for kube-prometheus:
|
I would say yes. If we can improve our users' environments by using a better performing feature, we should use it.
I vote yes.
No, I think this would break too many workflows. We can however put a trace notice in the By trace notice I mean something like this: https://github.com/prometheus-operator/kube-prometheus/compare/main...paulfantom:trace-notice?expand=1 cc @simonpasquier @philipgough you might be interested in this as AFAIR CMO is using anti-affinity heavily. |
I have no issues with us adding support for the addon while retaining support for inter pod affinity. Yes CMO is using I'd be curious to learn if the impact would be more substantial for high-churn workloads, as opposed to relatively static platform infra. Or if adding |
@paulfantom sounds good, thank you for pointing to I am going to experiment with |
This seems already implemented. https://prometheus-operator.dev/docs/operator/api/ Should this issue be closed? |
Karpenter implemented enough Pod Anti-Affinity support to satisfy the prometheus use case. It was released in v0.9.0 |
@migueleliasweb I think that this feature request is still valid because the ask is to implement a jsonnet addon that uses topologySpreadConstraints and this doesn't exist yet (though the prometheus-operator CRDs support it). |
What is missing?
I stumbled upon this new recommendation reading the karpenter.sh documentation1:
More in the official documentation about the motivations: https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/#comparison-with-podaffinity-podantiaffinity
It may not be worth to pursue improvements to the podAntiAffinity addon like #1090.
Thoughts?
Why do we need it?
Follow ever evolving Kubernetes best practices
Environment
kube-prometheus version:
Insert Git SHA here
Anything else we need to know?:
anti-affinity addon: https://github.com/prometheus-operator/kube-prometheus/blob/6d013d4e4f980ba99cfdafa9432819d484e2f829/jsonnet/kube-prometheus/addons/anti-affinity.libsonnet
Footnotes
https://karpenter.sh/docs/concepts/ ↩
The text was updated successfully, but these errors were encountered: