Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch webhook liveness/readiness probes to use http ports #3349

Merged
merged 1 commit into from
Oct 19, 2020
Merged

Switch webhook liveness/readiness probes to use http ports #3349

merged 1 commit into from
Oct 19, 2020

Conversation

ywluogg
Copy link
Contributor

@ywluogg ywluogg commented Oct 7, 2020

Changes

Switch webhook liveness/readiness probes to http ports and use a separate port from actual webhook HTTPS ports.
This fixes misleading errors mentioned in #3303

Webhook logs:

kubectl -n tekton-pipelines logs $(kubectl -n tekton-pipelines get pods -l app=tekton-pipelines-webhook -o name)
2020/10/06 23:57:31 maxprocs: Leaving GOMAXPROCS=2: CPU quota undefined
2020/10/06 23:57:31 Server listening on port 8080

This solves the issue because it doesn't send simple TCP request to a HTTPS port, which will cause TLS handshake to fail.

Other considerations will be 1. continue using the same HTTPS port but establish HTTPS handshake. This is not done because using the same port as actual webhook port for probes is not ideal for traffic and responsibility separation.

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

  • Includes tests (if functionality changed/added)
  • Includes docs (if user facing)
  • Commit messages follow commit message best practices
  • Release notes block has been filled in or deleted (only if no user facing changes)

See the contribution guide for more details.

Double check this list of stuff that's easy to miss:

Reviewer Notes

If API changes are included, additive changes must be approved by at least two OWNERS and backwards incompatible changes must be approved by more than 50% of the OWNERS, and they must first be added in a backwards compatible way.

Release Notes

@tekton-robot tekton-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Oct 7, 2020
@tekton-robot tekton-robot requested review from bobcatfish and a user October 7, 2020 00:04
@tekton-robot tekton-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Oct 7, 2020
@ywluogg ywluogg changed the title Switch webhook liveness/readiness probes to http ports Switch webhook liveness/readiness probes to use http ports Oct 7, 2020
Copy link
Member

@jerop jerop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/kind bug

please consider adding more detail in the commit message as described in https://github.com/tektoncd/community/blob/master/standards.md#commit-messages

@tekton-robot tekton-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 7, 2020
@ywluogg
Copy link
Contributor Author

ywluogg commented Oct 7, 2020

/retest

cmd/webhook/main.go Outdated Show resolved Hide resolved
cmd/webhook/main.go Outdated Show resolved Hide resolved
cmd/webhook/main.go Outdated Show resolved Hide resolved
cmd/webhook/main.go Outdated Show resolved Hide resolved
cmd/webhook/main.go Show resolved Hide resolved
Copy link
Member

@afrittoli afrittoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!
Eventually it would be nice to have probes that actually verify that the service is running - however this is definitely a step forward!
It would be great to do the same for the controller (in a different PR) if you'd like.
/lgtm

config/webhook.yaml Show resolved Hide resolved
@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 9, 2020
@tekton-robot tekton-robot removed the lgtm Indicates that a PR is ready to be merged. label Oct 15, 2020
@ywluogg
Copy link
Contributor Author

ywluogg commented Oct 16, 2020

Sorry I was distracted by something else for a bit and now was able to come back to this. @afrittoli I think @imjasonh is still not convinced to add the probes in controller and I also think it will be reasonable to follow the comment you added in #3111. I will keep track of the health checks they've added to knative package and take a look at that to make probes available in controller.

@mattmoor @pmorie I found knative/pkg#1048 but it's not clear to me then whether it is available yet or not. If not we could perhaps add a "shallow" /healthz for now, that always reports "ok" (as @imjasonh suggested) and switch to the knative/pkg one once it becomes available.

@imjasonh
Copy link
Member

/lgtm

@sbwsg do you have any other comments?

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 19, 2020
@ghost
Copy link

ghost commented Oct 19, 2020

Thanks @ywluogg !

/approve

@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbwsg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 19, 2020
@tekton-robot tekton-robot merged commit 7559968 into tektoncd:master Oct 19, 2020
@qu1queee
Copy link
Contributor

@ywluogg quick question, would be ok to have this same implementation for the missing probes in https://github.com/tektoncd/pipeline/blob/master/config/controller.yaml ?

@ywluogg
Copy link
Contributor Author

ywluogg commented Oct 20, 2020

@ywluogg quick question, would be ok to have this same implementation for the missing probes in https://github.com/tektoncd/pipeline/blob/master/config/controller.yaml ?

The issue was addressed in #3111. I will add the probes in controller once the health checks mentioned in #3111 is available.

@afrittoli
Copy link
Member

afrittoli commented Oct 20, 2020

The approach in knative for the webhook is actually to use the existing HTTPs endpoint like used to do, but with HTTPs instead of TCP, and passing a special k-kubelet-probe header:

https://github.com/knative/serving/blob/1d259fb640b5ba9748a430a0cfa90d451d5826d2/config/core/deployments/webhook.yaml#L95-L102

I guess we could do the same for Tekton and avoid the need for the extra server, but it would require doing TLS handshakes, so perhaps its good to stick with this solution :)

For the controller there is no solution yet, so one way forward might be to have a dedicated small HTTP server until we have a solution from knative/pkg. knative/pkg#1048 is back open

@mattmoor
Copy link
Member

You don't need the k-kubelet-probe header, this is sort of a vestige of a time when Istio incorrectly rewrote K8s probes to drop the header it sends. Maybe it's a special User-Agent? I think we have an IsKubeletProbe function somewhere that checks for these things.

@ywluogg
Copy link
Contributor Author

ywluogg commented Oct 21, 2020

The approach in knative for the webhook is actually to use the existing HTTPs endpoint like used to do, but with HTTPs instead of TCP, and passing a special k-kubelet-probe header:

https://github.com/knative/serving/blob/1d259fb640b5ba9748a430a0cfa90d451d5826d2/config/core/deployments/webhook.yaml#L95-L102

I guess we could do the same for Tekton and avoid the need for the extra server, but it would require doing TLS handshakes, so perhaps its good to stick with this solution :)

For the controller there is no solution yet, so one way forward might be to have a dedicated small HTTP server until we have a solution from knative/pkg. knative/pkg#1048 is back open

Pinging @imjasonh again for any suggestion for adding a temporary http server in controller. I think currently we may still need for info from other users about why they need this probe in the controller.

@afrittoli
Copy link
Member

The probe needed is mostly the liveness one.
When running multiple replicas through leader election, the probe helps detecting a faulty controller and releasing its bucket of resources to other controllers.

@ywluogg
Copy link
Contributor Author

ywluogg commented Oct 21, 2020

It sounds necessary to add at least the liveness probe in controller based on this situation. I don't think there is another simpler way to check on liveness under this situation. But I'm waiting for @imjasonh con re-assurance.

@imjasonh
Copy link
Member

I agree, it sounds like HA controller will benefit from having a liveness probe, which we can implement now with a simple dummy HTTP server, and later improve with real health checks.

@ywluogg
Copy link
Contributor Author

ywluogg commented Oct 23, 2020

Sounds good will add it as soon as possible.

@xiujuan95
Copy link
Contributor

xiujuan95 commented Oct 23, 2020

@ywluogg Go through above discussions, are you only enable liveness probe for controller, now? Could you pls also add readiness probe for controller? It's necessary for me. Thanks in advance!

@ywluogg
Copy link
Contributor Author

ywluogg commented Oct 23, 2020

@xiujuan95 do you mind also provide some details about the situation you plan to use it? Thanks

@xiujuan95
Copy link
Contributor

@ywluogg Sure, I can. Actually, It's a requirement from my manager team. It requires all pods need to enable liveness and readiness check. Besides, I think readiness can tell us if all containers within this pod are ready or not. It also makes sense! So pls also help add it! Thanks in advance!

@ywluogg
Copy link
Contributor Author

ywluogg commented Oct 26, 2020

Sounds good I will add them too.

@xiujuan95
Copy link
Contributor

@ywluogg Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants