-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support controller leader election in tekton-pipeline #2735
Comments
Just now, I do an experiment. I setup 3 tekton controller:
And apply 1 task and 1 taskRun using your sample:
I found there will have 3 taskrun pods come up, but only one is working. The other two are alway
So I think current tekton-pipeline doesn't support HA. If my understanding is wrong, pls correct me, thanks a lot! |
I tried the same setup a while ago and observed the same behaviour. I don’t think it’s supported yet, though there is a leader election config map, I am not sure if this is possible at all. |
@eddycharly Does it have a leader election configMap? Which one?
Could you tell me which one is for the leader-election, pls? Thanks! |
Looks like you don’t have it. Look here https://github.com/tektoncd/pipeline/blob/master/config/config-leader-election.yaml |
Also, controller down doesn’t mean a downtime. |
@eddycharly Thanks a lot! For above mentioned election configmap, if I apply it, then would the leader-election be processed? |
Hopefully... worth a try. |
@eddycharly I create above election configMap, and add it into the tekton controller deployment, but it still doesn't work as expected. Namely, it still sets up three taskRun pods. |
@xiujuan95 does your RBAC rules allow access to the config map ? |
@eddycharly Maybe not. I am trying to modify my RBAC and I want to use this image
|
|
@eddycharly I think now, my RBAC rules allow access to the election config map.
But, it still doesn't work. |
The leader election is something that was recently added by the knative team in Background: knative/pkg#1181. @xiujuan95 @eddycharly looking forward to the results of your experiments. |
I have some changes brewing that will change how this works a bit. Those changes will enable reconcilers to be sharded across replicas and should help us scale reconcilers horizontally. However, I want to get y'all onto |
Rotten issues close after 30d of inactivity. /close Send feedback to tektoncd/plumbing. |
Stale issues rot after 30d of inactivity. /lifecycle rotten Send feedback to tektoncd/plumbing. |
@tekton-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Any updates for HA? |
/remove-lifecycle rotten |
@vdemeester: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Just now, I did some experiments, I enabled leader-election for tekton-pipeline-controller. Yes, it works.
But when I tried to enter each pod and curl
Is it only the active one that opens the Prometheus metrics port? I went through the code, but I don't find where can I confirm this? Could you pls help me about this? Thanks in advance! |
Under the help of my colleague @qu1queee, thanks! Yes, metrics for prometheus only works on the active pod. Then this causes another problem for me. If I configure the tekton controller deployment to do liveness and readiness check like below:
Under this situation, only the active pod can be running, all passive pods will be
That's bad for me. So not sure if your guys can provide a better way to do liveness and readiness probes or not? Maybe it's related with this issue:#3111 |
@xiujuan95 Oh..It's so weird! I don't know what's wrong. But it works well for me. |
You need the double quotation marks.
This happens because you have the |
@afrittoli I have double quotation mark. But the pod still is in crash status! |
I found why the pod can't come up. 'disable-ha' should be added at the end rather than at the beginning:
If I change it to below, it will be fail:
The behavior is weird for me! Why? |
Uhm, this sounds like a bug to me |
@afrittoli I found the following settings for
We can't set it like above. Because above setting treats the value of The correct setting is:
I am not sure if this is a bug or not for your side. If it's yes, pls help fix it. Thanks in advance! BTW, I think there should have a doc to guide how to set these flags:https://github.com/tektoncd/pipeline/blob/v0.17.1/cmd/controller/main.go#L40-L56, otherwise, users maybe refer to some existed parameters, such as |
I think the only bits left on this are documentation and perhaps testing. |
@afrittoli Now, I am using
And I checked the log of each pod and found Next, I will verify if leader election can work well. So I delete the leader
Then, I checked the log of rest two pods, I found both of them has below logs:
It's so strange for me. Why both of them are starting to lead? Which one is the leader? Against the log, I can't get which is the leader. So could you pls help me to distinguish which one is leader? Or which log is the sign of leader? Thanks in advance! |
The HA setup is an active/active one:
|
@afrittoli I see. So at the same time, it's possible that two leaders which are responsible for two different buckets are handling two different reconcilers, right? |
Yep, exactly. |
@xiujuan95 would you be happy to close this one now? |
@xiujuan95 I will provide you more insights on the behaviour internally in IBM, there is a document from Knative around this. Sorry for not sharing this earlier. |
@afrittoli Yes, I think we can close this issue, now. Thanks! |
@afrittoli Sorry, just for confirming. I went through Then Does this mean webhook also support leader-election? And it's leader-election mode is same with controller? And if it's true, then if also I can set Pls help me about this, thanks in advance! |
@afrittoli please help verify this, thanks 🙏 |
This adds documentation around HA support for the tekton pipeline controller. HA is enabled by default, therefore adding more information on the behaviour and how could devs/maintainers use it.
Any updates about this comment: #2735 (comment) ? |
The Tekton webhook controller includes five different controllers:
According to knative docs, the |
@xiujuan95 let me know if this answer is satisfactory. I will close the issue now but feel free to re-open (or open a new one) should anything be missing. |
@afrittoli Thanks for your kind reply! Above answer makes sense for me. |
Expected Behavior
I want to support high availability (HA) for tekton-pipeline. Not sure whether tekton-pipeline has supported or not, now?
I mean HA is like leader-election, not the self-healing properties of kubernetes native.
Because, if I just depend on self-healing properties of kubernetes, it will cause some downtime for my services. This is not expected.
So if above question is yes, then I want to know which HA mode is being supported? If no, then if you guys have any plan about it? Thanks in advance!
Actual Behavior
Steps to Reproduce the Problem
Additional Info
Kubernetes version:
Output of
kubectl version
:Tekton Pipeline version:
Output of
tkn version
orkubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'
The text was updated successfully, but these errors were encountered: