-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tetragon: Hook exit sensor on acct_process #1509
Conversation
bcb643a
to
76fbd3d
Compare
would also be nice to get a test for this failing case if possible. |
e215ca1
to
19ee35a
Compare
19ee35a
to
b723367
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm Jiri thanks, can you please add later as a follow same test but that runs on the perf ring buffer.
In case we change some ref count logic that may not export the grpc exit event anymore, then we are still covered at bpf side.
Djalal and Anastasios found another way we could race in exit event hook, so we could receive multiple exit events with same pid value. Anastasios suggested to hook acct_process instead, which is called only for the last task in the thread group. The acct_process depends on CONFIG_BSD_PROCESS_ACCT config option but it seems to be present on all supported kernels. Signed-off-by: Jiri Olsa <jolsa@kernel.org>
The previous commit fixes the exit event race that might cause tetragon to receive multiple exit events with same pid values. The contrib/tester-progs/threads-exit program tries to exploit this by creating multi threads and synchronize all their exit calls so it's likely to hit the race window. The TestEventExitThreads test itself spawn several executions of threads-exit program (to push the luck a bit and hit the race window at least once) and records their pid values and then check we receive single exit event for any given pid value. Signed-off-by: Jiri Olsa <jolsa@kernel.org>
b723367
to
233e626
Compare
@olsajiri just for the record: If we don't want to depend on that config, then an alternative could be:
This will close the current group exit issue, but we still have a smaller window where separate threads exit on their own and could race. For that case the sensor handler or grpc one where the event parsing happens, plus the gc need to be bullet proof to ensure we collect process entries when we receive one single exit event (as live.counter == 0) is good indication anyway, and ignore following bpf exit event if ever they happen. However I identified some other issues on the gc logic: #1517 where we underflow the refcount and won't collect entries... I will try to fix that separately anyway as this is another bug. Maybe then if we feel we are fine we could have that code and switch off from CONFIG_BSD_PROCESS_ACCT. For the time being I think current solution is ok |
Djalal and Anastasios found another way we could race in exit
event hook, so we could receive multiple exit events with same
pid value.
Anastasios suggested to hook acct_process instead, which is
called only for the last task in the thread group.
The acct_process depends on CONFIG_BSD_PROCESS_ACCT config
option but it seems to be present on all supported kernels.