-
Notifications
You must be signed in to change notification settings - Fork 384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use HTTP liveness probe instead of tetra status #2474
Conversation
bd2eb34
to
4ae2b3d
Compare
d6c18d1
to
56363a8
Compare
Now, we use tetra status command to report the status of tetragon agent. This comes with some overheads as tetra binary has a lot of additional functionality and it seems like an overkill to use that for status reporting. On the other hand, k8s supports liveness probes by using an HTTP endpoint (i.e. https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#http-probes). This patch first creates a new HTTP endpoint to report agent status that can be used for the liveness probe. Signed-off-by: Anastasios Papagiannis <tasos.papagiannnis@gmail.com>
56363a8
to
f3c11ed
Compare
✅ Deploy Preview for tetragon ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
# -- Use http-based livenessProbe for the tetragon container. livenessProbe has a higher priority. | ||
livenessHttpProbe: | ||
enabled: true | ||
# host: "localhost" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would better to have this under livenessProbe
somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could do something like:
{{- if .Values.tetragon.livenessProbe.enabled }}
livenessProbe:
timeoutSeconds: 60
{{- if .Values.tetragon.livenessProbe.custom }}
{{- toYaml .Values.tetragon.livenessProbe.custom | nindent 4 }}
{{- else if .Values.tetragon.livenessProbe.http.enabled }}
httpGet:
path: /liveness
port: 6789
{{- if .Values.tetragon.livenessProbe.http.host }}
host: {{ .Values.tetragon.livenessProbe.http.host }}
{{- end }}
{{- else if .Values.tetragon.grpc.enabled }}
exec:
command:
- tetra
- status
- --server-address
- {{ .Values.tetragon.grpc.address }}
- --retries
- "5"
{{- end -}}
{{- end -}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using that we can have:
livenessProbe:
enabled: false
to disable liveness probe.
livenessProbe:
enabled: true
custom:
grpc:
port: 54321
to define a custom liveness probe.
livenessProbe:
enabled: true
http:
enabled: true
to define an http-based liveness probe.
livenessProbe:
enabled: true
to define our legacy "tetra status"-based liveness probe (assuming tetragon.grpc.enabled
is true).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something, but can we just change the default livenessProbe
and keep everything else the same? That is, change the default values to:
livenessProbe:
timeoutSeconds: 60
httpGet:
path: /liveness
port: 6789
and keep _container_tetragon.tpl
unchanged. Then users can still use the tetra probe by setting livenessProbe: {}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this makes sense (easier compared to what I am trying to do). But what I am not able to achieve is setting livenessProbe: {}
. Possibly I am missing something.
First, change the existing values.yaml
with:
@@ -64,9 +64,11 @@ tetragon:
securityContext:
privileged: true
# -- Overrides the default livenessProbe for the tetragon container.
- livenessProbe: {}
- # grpc:
- # port: 54321
+ livenessProbe:
+ timeoutSeconds: 60
+ httpGet:
+ path: "/liveness"
+ port: 6789
# Tetragon puts processes in an LRU cache. The cache is used to find ancestors
# for subsequently exec'ed processes.
Then I run:
$ cat values.yaml
tetragon:
livenessProbe: {}
$ helm template ./install/kubernetes/tetragon -f ./values.yaml
[...]
livenessProbe:
httpGet:
path: /liveness
port: 6789
timeoutSeconds: 60
[...]
Which seems to be the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I think I found a way to do that. Updating the PR now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should define
$ cat values.yaml
tetragon:
livenessProbe: null
instead of
$ cat values.yaml
tetragon:
livenessProbe: {}
to use the old tetra-based liveness probe.
PR is ready for review.
The previous commit introduced an HTTP endopoint that can be used for the liveness probe. This patch changes helm to make that default instead of the tetra status based liveness probe. The user can still use the tetra status based liveness probe by defining a values file similar to: $ cat values.yaml tetragon: livenessProbe: null Signed-off-by: Anastasios Papagiannis <tasos.papagiannnis@gmail.com>
f3c11ed
to
d9bd92c
Compare
i have my opinion about this. i'll add my review here tomorrow 🚀 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the gRPC server is disabled then this new healthcheck will always fail, right? I think it shouldn't.
I don't have context on why the GetHealth
endpoint was introduced, but it looks a bit awkward to me. gRPC itself defines the health checking service, so I would use the upstream proto and start the health service alongside the regular Tetragon gRPC server.
Also, Kubernetes now supports gRPC liveness probes, so unless we want to support old Kubernetes versions, we can use them instead of starting another HTTP server.
Not sure why with gRPC server disabled the new check will fail. How is that related?
I also don't have context about the details of |
Ok, I actually checked |
d47e9c9
to
d9bd92c
Compare
Closing in favor of #2478 as we choose to proceed with a gRPC-based liveness probe. |
Please ensure your pull request adheres to the following guidelines:
description and a
Fixes: #XXX
line if the commit addresses a particularGitHub issue.
Now, we use tetra status command to report the status of tetragon agent. This comes with some overheads as tetra binary has a lot of additional functionality and it seems like an overkill to use that for status reporting.
On the other hand, k8s supports liveness probes by using an HTTP endpoint (i.e. https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#http-probes). This patch first creates a new HTTP endpoint to report agent status and then change helm to make use of that.