-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpf: fix policyfilter issue for existing processes #1590
Conversation
b6782d0
to
b5e7bc3
Compare
fb57ba0
to
0419ece
Compare
There is an issue with the cgroup tracker id (details can be found below). This issue causes policyfilter to misbehave. Furthermore, the original intention behind the cgroup tracker id was never realized and so it currently serves no purpose. Its only user is the policyfilter. This PR removes the cgroup tracker id, and, instead, computes the cgroup id at the time of the policyfilter check. For most common setups (cgroupv2) the overhead of doing so is negligible. Computing the cgroup id at the time of check is more reliable and also simplifies the code. More details can be found below. We have a namespaced policy installed that mathes connect() calls to 8.8.8.8, and execute the following on a pod in the same namespace: `kubectl exec -it nonqos-demo -- curl 8.8.8.8` We observe two things: * there is no event from the policy (as it should be) * the cgroup tracker id of the curl process is the same as the parent runc process (which should not be the case) The exec event we get from curl has: .process.pid: 244958 .process.binary: /usr/bin/curl .parent.pid: 244948 .parent.binary: /usr/bin/runc Looking at the cgroup tracker id in the execve map for these two pids we see: "cgrpid_tracker": 2371 "cgrpid_tracker": 2371 Which means that the cgroup id tracker is not correct, and the curl process will not be matched by the policy. The issue might arise from the fact that in the wake_up_new_task(), computing the cgroup tracker id, uses get_current_cgroup_id(). We are, however, still in the parent so the get_current_cgroup_id() helper will return the cgroup of the parent, not of the child (which is what we want). To simplify the code complexity (both from the perspective of understandabiliuty but also verification complexity), we instead remove the tracker id and retrieve the current cgroup id at the time of the check. This is much simpler, and not less efficient for the common case (cgroupv2). Following the same steps with a version build with the current patch, produces an event as expected. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
0419ece
to
37223bd
Compare
Nice lets go for correctness first and we can optimize later if perf ever is measurable. |
btw @jrfastab, @kkourt usually those cases will fork and then migrate the tasks to the appropriate cgroup since they don't fork from pid 1 of container, so to catch those in bpf I planned this https://github.com/cilium/tetragon/blob/pr/tixxdz/cgroup-bpf-full/bpf/cgroup/bpf_cgroup_attach_task.c#L32 but due to lack of time... For this wake_up_new_task() even if we are passing the new task and fetching its cgroup (not the parent) but down the line if the fs uses a cgroupv2 then it switches back to cgroupv2 helpers which only work with current context... Our own bpf cgroup helpers are far superior since they allows to work on target tasks https://github.com/cilium/tetragon/blob/main/bpf/lib/bpf_cgroup.h#L377 transparently for both cgroupv1 and cgroupv2, I should just have used this instead of upstream bpf helper... Anyway thanks for fixing this, I plan to retake and finish it. |
There is an issue with the cgroup tracker id (details can be found below). This issue causes policyfilter to misbehave. Furthermore, the original intention behind the cgroup tracker id was never implemented and so it currently serves no purpose. Its only user is the policyfilter.
This patch removes the cgroup tracker id, and, instead, computes the cgroup id at the time of the policyfilter check. For most common setups (cgroupv2) the overhead of doing so is negligible. Computing the cgroup id at the time of check is more reliable and also simplifies the code.
More details can be found below.
We have a namespaced policy installed that mathes connect() calls to 8.8.8.8, and execute the following on a pod in the same namespace:
kubectl exec -it nonqos-demo -- curl 8.8.8.8
We observe two things:
The exec event we get from curl has:
.process.pid: 244958
.process.binary: /usr/bin/curl
.parent.pid: 244948
.parent.binary: /usr/bin/runc
Looking at the cgroup tracker id in the execve map for these two pids we see:
"cgrpid_tracker": 2371
"cgrpid_tracker": 2371
Which means that the cgroup id tracker is not correct, and the curl process will not be matched by the policy.
The issue might arise from the fact that in the wake_up_new_task(), computing the cgroup tracker id, uses get_current_cgroup_id(). We are, however, still in the parent so the get_current_cgroup_id() helper will return the cgroup of the parent, not of the child (which is what we want).
To simplify the code complexity (both from the perspective of understandabiliuty but also verification complexity), we instead remove the tracker id and retrieve the current cgroup id at the time of the check. This is much simpler, and not less efficient for the common case (cgroupv2).
Following the same steps with a version build with the current patch, produces an event as expected.