-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable Webhook PDB by default, document enabling it #3787
Conversation
/lgtm |
@nikhil-thomas: changing LGTM is restricted to collaborators In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/meow
Thanks for doing this @imjasonh (and @nikhil-thomas for the exploration)
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vdemeester The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi Jason, A quick question, is that possible to remove the PDB setting from Tekton deployment and document how to configure it in the document? :) Then the user can choose enable or disable it by themselves. We enable PDB in our Tekton on production env. If it is set |
I see @zhangtbj 's point here. Shipping the hpa manifests might override existing values. We may consider skipping the hpa manifests altogether and instead include documentation on it ? |
Sorry for letting this PR slip through the cracks. Let's get this in before 0.22. To clarify, the specific ask is to remove the PDB from |
That's right, Jason.
…On Tue, Mar 9, 2021, 12:38 Jason Hall ***@***.***> wrote:
Sorry for letting this PR slip through the cracks. Let's get this in
before 0.22.
To clarify, the specific ask is to remove the PDB from webhook-hpa.yaml
from the default Tekton installation bundle, and instead *document* how
to enable it, with example YAML in the docs. Does that sound right to you
@zhangtbj <https://github.com/zhangtbj> @sbose78
<https://github.com/sbose78> ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3787 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEFEAEIDY722GJ7AQ3X6V3TCZFIPANCNFSM4YEVKIRA>
.
|
Done. 👍 |
@sbose78 please help review the changes 🙏 (looking for |
/lgtm
…On Tue, Mar 9, 2021, 16:06 Priti Desai ***@***.***> wrote:
@sbose78 <https://github.com/sbose78> please help review the changes 🙏
(looking for /lgtm 😉 )
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3787 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEFEAB3QWZFYJNKFYUCSRDTCZ5VPANCNFSM4YEVKIRA>
.
|
@sbose78: changing LGTM is restricted to collaborators In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Thanks @sbose78, we will have to add you to the org. I will create a separate PR in the community repo. |
You should get the /lgtm |
Thank you very much, Priti!
…On Tue, Mar 9, 2021, 18:42 Priti Desai ***@***.***> wrote:
You should get the /lgtm privilege after this
<tektoncd/community#376> PR in community repo is
merged 😄 Until then,
/lgtm
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3787 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEFEAHM436V3SUK6LEZLKTTC2P3JANCNFSM4YEVKIRA>
.
|
Cool, thank you Jason! :) |
The safe-to-evict annotation tells the cluster autoscaler whether the pod can be evicted to allow the node it's on to scale down. This was set to false (by me!) 2 years ago in tektoncd@fc6ef39 to prevent service unreliability during scale-down events. If the no webhook replicas are available, users can't create/update/delete Tekton objects; if no controller replicas are available, status updates from Pod events, etc., won't be processed. Unfortunately, blocking node eviction means the node that the pod(s) get scheduled to can't be scaled down. Furthermore, the nodes can't be fully drained when updating the cluster. This can leave a cluster in a mid-upgrade state that can make issues difficult to diagnose and reason about. With this change, a cluster scale-down event might cause temporary service unreliability with the default single-replica configuration. As with tektoncd#3787 if a user/operator wants to prevent this, they should configure more replicas for HA.
The safe-to-evict annotation tells the cluster autoscaler whether the pod can be evicted to allow the node it's on to scale down. This was set to false (by me!) 2 years ago in fc6ef39 to prevent service unreliability during scale-down events. If the no webhook replicas are available, users can't create/update/delete Tekton objects; if no controller replicas are available, status updates from Pod events, etc., won't be processed. Unfortunately, blocking node eviction means the node that the pod(s) get scheduled to can't be scaled down. Furthermore, the nodes can't be fully drained when updating the cluster. This can leave a cluster in a mid-upgrade state that can make issues difficult to diagnose and reason about. With this change, a cluster scale-down event might cause temporary service unreliability with the default single-replica configuration. As with #3787 if a user/operator wants to prevent this, they should configure more replicas for HA.
The safe-to-evict annotation tells the cluster autoscaler whether the pod can be evicted to allow the node it's on to scale down. This was set to false (by me!) 2 years ago in tektoncd@fc6ef39 to prevent service unreliability during scale-down events. If the no webhook replicas are available, users can't create/update/delete Tekton objects; if no controller replicas are available, status updates from Pod events, etc., won't be processed. Unfortunately, blocking node eviction means the node that the pod(s) get scheduled to can't be scaled down. Furthermore, the nodes can't be fully drained when updating the cluster. This can leave a cluster in a mid-upgrade state that can make issues difficult to diagnose and reason about. With this change, a cluster scale-down event might cause temporary service unreliability with the default single-replica configuration. As with tektoncd#3787 if a user/operator wants to prevent this, they should configure more replicas for HA. (cherry picked from commit 5350069) Signed-off-by: Vincent Demeester <vdemeest@redhat.com>
Fixes #3654
Changes
This change disables PodDisruptionBudget for the webhook deployment, and documents how to re-enable it in docs/enabling-ha. It also makes some edits to enabling-ha.md to streamline and recommend best practices.
/kind bug
Submitter Checklist
These are the criteria that every PR should meet, please check them off as you
review them:
See the contribution guide for more details.
Double check this list of stuff that's easy to miss:
cmd
dir, please updatethe release Task to build and release this image.
Reviewer Notes
If API changes are included, additive changes must be approved by at least two OWNERS and backwards incompatible changes must be approved by more than 50% of the OWNERS, and they must first be added in a backwards compatible way.
Release Notes
cc @vdemeester @nikhil-thomas @bobcatfish