Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix monitoring BPF maps #1774

Closed
6 tasks
lambdanis opened this issue Nov 20, 2023 · 3 comments · Fixed by #2291
Closed
6 tasks

Fix monitoring BPF maps #1774

lambdanis opened this issue Nov 20, 2023 · 3 comments · Fixed by #2291
Assignees
Labels
area/metrics Related to prometheus metrics area/userspace Related to userspace Tetragon logic

Comments

@lambdanis
Copy link
Contributor

lambdanis commented Nov 20, 2023

Tetragon exposes metrics about BPF maps, defined in the mapmetrics package. There are a few issues with them:

  • Map capacity is exposed as total label in tetragon_map_in_use_gauge metric (and also, for eventcache it's set to "0"). This is against OpenMetrics conventions and makes it difficult (or impossible) to calculate the map utilization. Map capacity should be exposed as a separate metric. Alternatively, Tetragon could calculate the map utilization itself, but exposing the map capacity seems better.
  • The tetragon_map_in_use_gauge metric name is not very intuitive IMO and the gauge suffix is unnecessary.
  • processLru and eventcache are not BPF map, they're user-space caches, but info about them is exposed via mapmetrics too. This is confusing, it should be exposed as separate metrics.
  • tetragon_map_errors_total metric seems to have an incorrect help text.
  • tetragon_map_drops_total metric is incorrect. We get this from this callback. But this is not called only on evictions, it is also called when we remove elements normally, as we do here. The evictions are counted be the tetragon_errors_total{type="process_cache_evicted"} metric (defined here), so tetragon_map_drops_total can be just removed.
  • Metrics are not exposed for all maps: bpf maps:metrics: fix bpf maps metrics #1775
@lambdanis lambdanis added the area/metrics Related to prometheus metrics label Nov 20, 2023
@lambdanis lambdanis changed the title Map monitoring improvements Improve maps monitoring Nov 20, 2023
@lambdanis lambdanis added the area/userspace Related to userspace Tetragon logic label Nov 21, 2023
@lambdanis lambdanis changed the title Improve maps monitoring Fix monitoring BPF maps Dec 12, 2023
@sadath-12
Copy link
Contributor

/assign

@kkourt
Copy link
Contributor

kkourt commented Jan 9, 2024

@sadath-12 I've unassigned you from this issue because there are many different tasks that can happen in parallel I think. For example, @tpapagian already has a PR to address #1949. If you want to work on this, I'd suggest pick a specific bullet point and we can create a separate issue for that.

Thanks!

@sadath-12
Copy link
Contributor

thanks I almost covered it #1950 . I'll be detailing about them in the description

sadath-12 added a commit to sadath-12/tetragon that referenced this issue Jan 9, 2024
solving all monitoring issues exposed at issue cilium#1774

Signed-off-by: sadath-12 <sadathsadu2002@gmail.com>
sadath-12 added a commit to sadath-12/tetragon that referenced this issue Jan 9, 2024
solving all monitoring issues exposed at issue cilium#1774

Signed-off-by: sadath-12 <sadathsadu2002@gmail.com>
sadath-12 added a commit to sadath-12/tetragon that referenced this issue Jan 9, 2024
solving all monitoring issues exposed at issue cilium#1774

Signed-off-by: sadath-12 <sadathsadu2002@gmail.com>
sadath-12 added a commit to sadath-12/tetragon that referenced this issue Jan 10, 2024
author sadath-12 <sadathsadu2002@gmail.com> 1704801541 +0530
committer sadath-12 <sadathsadu2002@gmail.com> 1704914706 +0530

parent 8802fd6
author sadath-12 <sadathsadu2002@gmail.com> 1704801541 +0530
committer sadath-12 <sadathsadu2002@gmail.com> 1704914704 +0530

parent 8802fd6
author sadath-12 <sadathsadu2002@gmail.com> 1704801541 +0530
committer sadath-12 <sadathsadu2002@gmail.com> 1704914701 +0530

parent 8802fd6
author sadath-12 <sadathsadu2002@gmail.com> 1704801541 +0530
committer sadath-12 <sadathsadu2002@gmail.com> 1704914698 +0530

parent 8802fd6
author sadath-12 <sadathsadu2002@gmail.com> 1704801541 +0530
committer sadath-12 <sadathsadu2002@gmail.com> 1704914694 +0530

parent 8802fd6
author sadath-12 <sadathsadu2002@gmail.com> 1704801541 +0530
committer sadath-12 <sadathsadu2002@gmail.com> 1704914657 +0530

Fix: monitoring BPF maps

solving all monitoring issues exposed at issue cilium#1774

Signed-off-by: sadath-12 <sadathsadu2002@gmail.com>

make: Introduce install/kubernetes/Makefile

Signed-off-by: sadath-12 <sadathsadu2002@gmail.com>

workflows: bump bom version for race condition fix

We experienced cilium#1894, which made me create kubernetes-sigs/bom#385 only
to fix the race condition and realize that it was already patched but no
release existed.

Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>

tetragon: Remove early binary filter bpf code

We can move the binary path matching early in the filter
program, so it's invoked just one tailcall from where it
is now.

This won't make much difference wrt early binary filter
invocation, so we can remove it.

As a side effect we free some (~60) instructions in filter_arg
program for small kernels.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>

tetragon: Remove early binary filter config code

Now that we do not have bpf support for early filter
et's remove the go config code for that as well.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>

tetragon: Split filter_tailcall_index in two values

And make the index names more appropriate.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>

tetragon: Use selidx name for selector index

The selidx is used in other parts of the code and
fits more than just generic index.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>

tracingpolicy: add a message field to inform users what is happening

Right now all fields in Tetragon events are generated from code,
the only exception is the policy_name which does not inform users
what is going on.

This optional "message" field if present will be copied and printed
in the final generated event.

We want to inform users what is really happening, why this event was
generated? it is not obvious to non technical users.

An optional abstracted could be easy to read something like:
  "message": "Kernel module being loaded"

As we want to capture new users, the policy_name is not enough and
strictly speaking it has its own purpose, using abstracted messages
or logs appeals to more users.

The down side of this is our generated events may grow in size, thus
the new "message" field will be chopped to 256 characters only.

In future if the event size is really a problem, we can start shipping
default field filters, that hide these fields from the source before
even constructing the event.

Signed-off-by: Djalal Harouni <tixxdz@gmail.com>

api:proto: add short message to print Tracing Policy message

Signed-off-by: Djalal Harouni <tixxdz@gmail.com>

events: include the tracing policy message in events

If a tracing policy has the message field set, then include it
into the reported events. It is optional.

The message field is properly escaped.

Signed-off-by: Djalal Harouni <tixxdz@gmail.com>

tests: ensure that message field is reported

Signed-off-by: Djalal Harouni <tixxdz@gmail.com>

policylibrary: add message field to module tracing policy

This adds the message field to the module tracing library as an
example of what the event is about.

Signed-off-by: Djalal Harouni <tixxdz@gmail.com>

test: add message testing

Signed-off-by: Djalal Harouni <tixxdz@gmail.com>

metrics: Do not return when we cannot find a _stats map

In Tetragon, we have maps ending with _stats that contain metadata about
the main maps. An example is that we have execve_map and
execve_map_stats that contains errors and other information related to
execve_map map.

Now when we fail to open the _stats map, we just stop the loop and we
can possibly miss to check for other maps. This patch change that and
tries for the next map.

Additionally we do not try to collect stats for maps that already end up
with _stats as this would result in _stats_stats map names.

Signed-off-by: Anastasios Papagiannis <tasos.papagiannnis@gmail.com>

fix(deps): update module golang.org/x/sync to v0.6.0

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

tester-progs: compile killer-tester-32 statically

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>

vmtests: make a note when all tests skipped

Currently, if all tests are skipped we print success, which is
misleading. Print a better message.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>

vmtests: listTests now accepts a pattern

Previously, we always used "." to list tests in listTests. This patch
adds it as an argument to be used in a later patch.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>

vmtests: improve running tests from a file

This patch improves loading tests form a file.

First, it allows for having comments with lines starting with '#'.

Second, it lists the tests based on the provided user pattern instead of
using it as it is. Previously, the test name was defined by the user
pattern. This lead to inconsistent reporting. This patch fixes this
because now we report per-test results rather than per-pattern results.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>

vmtests: use a strict pattern for test execution

It is possible that a test is a substring of another. Use a strict
pattern so that we execute only the specified test.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>

tetragon-vmtests-run: support for detailed results

There are two ways to skip a test in our CI:
 1. add it to CiBlacklist in split-tetragon-gotests
 2. use t.Skip()

CI results report 1., but they do not report 2.

This means that skipped tests can go easily unnoticed.

To report tests skipped with (2), we need to parse the result files.
This patch does exactly that. It adds a new option
--enable-detailed-results. If this option is set, we set KeepAllLogs
because we need to parse all logs after the tests are finished.

The tests we skip with (1) are marked with ⏭️.
We introduce a new symbol for the tests that are skipped with (2): ⚡

Parsing the detailed results allows us to also report sub-tests (i.e.,
tests executed with t.Run()).

Here's an output example:

```
✅      pkg.sensors.tracing.Test_Kprobe_DisableEnablePolicy                                     (total:3 failed:0 skipped:0)    2.680496982s    1m33.952671893s
├─✅    Test_Kprobe_DisableEnablePolicy/sensor                                                  1.05s                           (1.05s)
├─✅    Test_Kprobe_DisableEnablePolicy/tracing-policy                                          1.09s                           (2.14s)
└─✅    Test_Kprobe_DisableEnablePolicy                                                         2.56s                           (4.7s)
⚡      pkg.sensors.tracing.TestKillerOverride32                                                (total:1 failed:0 skipped:1)    130.759191ms    1m34.083431084s
⚡      pkg.sensors.tracing.TestKillerSignal32                                                  (total:1 failed:0 skipped:1)    132.393451ms    1m34.215824535s
⚡      pkg.sensors.tracing.TestKillerOverrideBothBits                                          (total:1 failed:0 skipped:1)    119.455061ms    1m34.335279596s
⚡      pkg.sensors.tracing.TestKillerOverride                                                  (total:1 failed:0 skipped:1)    121.255148ms    1m34.456534744s
⚡      pkg.sensors.tracing.TestKillerSignal                                                    (total:1 failed:0 skipped:1)    136.665421ms    1m34.593200165s
✅      pkg.sensors.tracing.TestKillerMulti                                                     (total:1 failed:0 skipped:0)    132.761861ms    1m34.725962026s
```

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>

gh: add --enable-detailed-results to vmtests

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>

chore(deps): update all lvh-images main

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

tests: Unload policy after load

Unload the policy that was loaded during the test

Signed-off-by: sadath-12 <sadathsadu2002@gmail.com>

api: detect binary execution that raise process privileges

Add ProcessPrivilegesChanged to note reasons why the process
privileges changed, and start using it in the binary_properties
to indicate why the binary execution changed or raised privileges.

For now we handle only elevating privileges, in future depending
on user's requests we may do privilege changes or dropping. The
ProcessPrivilegesChanged is an enum able to handle different values.

The new `binary_properties.privileges_changed` contains the reasons
why this binary execution gained new capabilities or elevated its
privileges. Usually this happens due:
1. File capability sets on the binary.
2. set-user-ID to root bit is set on the binary.

Details of ProcessPrivilegesChanged:

1. PRIVILEGES_RAISED_EXEC_FILE_CAP - The File capability sets on binary:
  The kernel supports associating capability sets with an executable
  file using `setcap` command.

  The file capability sets are stored in an extended attribute
  named `security.capability`. The file capability sets, in conjunction
  with the capability sets of the process, determine the process
  capabilities and privileges after the `execve` system call.

  For further reference, please check sections
  `File capability extended attribute versioning` and
  `Namespaced file capabilities` of the capabilities man pages:
  https://man7.org/linux/man-pages/man7/capabilities.7.html.

2. PRIVILEGES_RAISED_EXEC_FILE_SETUID - set-user-ID to root bit is set
  on the binary:
  When a process with nonzero UIDs executes a binary with a set-user-ID
  to root also known as suid-root executable, then the kernel switches
  the effective user ID to 0 (root) which is a privilege elevation
  operation since it grants access to resources owned by the root user.
  The effective user ID is listed inside the `process_credentials` part
  of the `process` object.
  For further reading, section
  `Capabilities and execution of programs by root` of
  https://man7.org/linux/man-pages/man7/capabilities.7.html.

  Afterward the kernel recalculates the capability sets of the process
  and grants all capabilities in the permitted and effective capability
  sets, except those masked out by the capability bounding set.
  If the binary also have file capability sets then these bits are honored
  and the process gains just the capabilities granted by the file
  capability sets (i.e., not all capabilities, as it would occur
  when executing a set-user-ID to root binary that does not have any
  associated file capabilities).
  This is described in section
  `Set-user-ID-root programs that have file capabilities` of
  https://man7.org/linux/man-pages/man7/capabilities.7.html.

  The new granted capabilities can be listed inside the `process` object.

  There is one exception for the special treatments of set-user-ID to
  root execution receiving all capabilities, if the `SecBitNoRoot` bit of
  the Secure bits is set, then the kernel does not grant any capability.
  The secure bits are already exposed part of the `process_credentials`
  object.

  Please check section:
  `The securebits flags: establishing a capabilities-only environment`
  of the capabilities man pages:
  https://man7.org/linux/man-pages/man7/capabilities.7.html

The new privileges_changed contains the reasons, no need to encode
again the full capabilities there, let's just indicate to the user
and if they are interested they can inspect other fields that contain
the permitted and effective capabilities of the current execution.

*Limitations:*
1. We do not handle the case of an unprivileged user id that is
_mapped_ to user id root 0 inside its own namespace then performs a
setuid to root execution. Current eBPF infrastructure does not support
user id translation into target user namespace. Plus current Tetragon
design deals with global uids, so it is not clear at this moment how to
handle this corner case. For this the binary_properties.privileges_changed
is an array that can hold extra values in the future.

2. We do not handle the case of a binary having both setuid to root bit
 set and file capability sets on the binary. Right now we report in the
 `privileges_changed` only `PRIVILEGES_RAISED_EXEC_FILE_SETUID` where
 we should also add `PRIVILEGES_RAISED_EXEC_FILE_CAP`. This is left for
 the future as we have to read the `security.capability` field of the
 extended attributes of the binary and parse the capabilities that are
 actually set to ensure that new capabilities received are not only due
 to the set-user-ID root exection.

Signed-off-by: Djalal Harouni <tixxdz@gmail.com>

tetragon: detect binary execution that raise process privileges

From a higher level:
Populate binary_properties.privileges_changed with the
the reasons why this binary execution gained new capabilities or
elevated its privileges. Usually this happens due:
1. File capability sets on the binary.
2. set-user-ID to root bit is set on the binary.

Details of ProcessPrivilegesChanged:

1. PRIVILEGES_RAISED_EXEC_FILE_CAP - The File capability sets on binary:
  The kernel supports associating capability sets with an executable
  file using `setcap` command.

  The file capability sets are stored in an extended attribute
  named `security.capability`. The file capability sets, in conjunction
  with the capability sets of the process, determine the process
  capabilities and privileges after the `execve` system call.

  For further reference, please check sections
  `File capability extended attribute versioning` and
  `Namespaced file capabilities` of the capabilities man pages:
  https://man7.org/linux/man-pages/man7/capabilities.7.html.

2. PRIVILEGES_RAISED_EXEC_FILE_SETUID - set-user-ID to root bit is set
  on the binary:
  When a process with nonzero UIDs executes a binary with a set-user-ID
  to root also known as suid-root executable, then the kernel switches
  the effective user ID to 0 (root) which is a privilege elevation
  operation since it grants access to resources owned by the root user.
  The effective user ID is listed inside the `process_credentials` part
  of the `process` object.
  For further reading, section
  `Capabilities and execution of programs by root` of
  https://man7.org/linux/man-pages/man7/capabilities.7.html.

  Afterward the kernel recalculates the capability sets of the process
  and grants all capabilities in the permitted and effective capability
  sets, except those masked out by the capability bounding set.
  If the binary also have file capability sets then these bits are honored
  and the process gains just the capabilities granted by the file
  capability sets (i.e., not all capabilities, as it would occur
  when executing a set-user-ID to root binary that does not have any
  associated file capabilities).
  This is described in section
  `Set-user-ID-root programs that have file capabilities` of
  https://man7.org/linux/man-pages/man7/capabilities.7.html.

  The new granted capabilities can be listed inside the `process` object.

  There is one exception for the special treatments of set-user-ID to
  root execution receiving all capabilities, if the `SecBitNoRoot` bit of
  the Secure bits is set, then the kernel does not grant any capability.
  The secure bits are already exposed part of the `process_credentials`
  object.

  Please check section:
  `The securebits flags: establishing a capabilities-only environment`
  of the capabilities man pages:
  https://man7.org/linux/man-pages/man7/capabilities.7.html

  The new privileges_changed contains the reasons, no need to encode
  again the full capabilities there, let's just indicate to the user
  and if they are interested they can inspect other fields that contain
  the permitted and effective capabilities of the current execution.

Implementation:
The capabilities are re-calculated by the capability LSM during execve,
so let's avoid trying to be smart and calculate or guess them...
we just display the final results.

To display these fields, Tetragon must be run with enable-process-cred.

1. If the setuid bit is set we check if it is a privileged root and if
   it's real uid is not global root and effective uid is global root, if so
   then this a suid binary execution which is already enough for
   Tetragon to report as a privilege change execution.

2. Check if the setuid bit is not set and the new permitted capability
   sets is higher then the previous one then is is a file capability
   sets execution.

*Limitations:*
1. We do not handle the case of an unprivileged user id that is
  _mapped_ to user id root 0 inside its own namespace then performs a
  setuid to root execution. Current eBPF infrastructure does not support
  user id translation into target user namespace. Plus current Tetragon
  design deals with global uids, so it is not clear at this moment how to
  handle this corner case. For this the binary_properties.privileges_changed
  is an array that can hold extra values in the future.

2. We do not handle the case of a binary having both setuid to root bit
  set and file capability sets on the binary. Right now we report in the
  `privileges_changed` only `PRIVILEGES_RAISED_EXEC_FILE_SETUID` where
  we should also add `PRIVILEGES_RAISED_EXEC_FILE_CAP`. This is left for
  the future as we have to read the `security.capability` field of the
  extended attributes of the binary and parse the capabilities that are
  actually set to ensure that new capabilities received are not only due
  to the set-user-ID root exection.

Example output event:

File capabilities execution:
  "process_exec": {
    "process": {
      "exec_id": "OjIwMTAxOTMxNzAzMjc4OjE0MDUwOA==",
      "pid": 140508,
      "uid": 1000,
      "cwd": "/home/tixxdz/work/station/code/src/github.com/tixxdz/tetragon",
      "binary": "/usr/bin/ping",
      "arguments": "ebpf.io",
      "flags": "execve clone",
      "start_time": "2023-11-23T14:32:53.227623175Z",
      "auid": 1000,
      "parent_exec_id": "OjI5NjYzNzAwMDAwMDA6NDQ4NTk=",
      "refcnt": 1,
      "cap": {
        "permitted": [
          "CAP_NET_RAW"
        ],
        "effective": [
          "CAP_NET_RAW"
        ]
      },
      "tid": 140508,
      "process_credentials": {
        "uid": 1000,
        "gid": 1000,
        "euid": 1000,
        "egid": 1000,
        "suid": 1000,
        "sgid": 1000,
        "fsuid": 1000,
        "fsgid": 1000
      },
      "binary_properties": {
        "privileges_changed": [
          "PRIVILEGES_RAISED_EXEC_FILE_CAP"
        ]
      }
    },

Setuid suid execution
(from non uid 0 -> to global uid 0 in root user namespace):
  "process_exec": {
    "process": {
      "exec_id": "OjMyMTg1ODY1MDcxNTQ6Mjg5NTU=",
      "pid": 28955,
      "uid": 1000,
      "cwd": "/home/tixxdz/work/station/code/src/github.com/tixxdz/tetragon",
      "binary": "/usr/bin/su",
      "arguments": "--help",
      "flags": "execve clone",
      "start_time": "2023-12-11T14:59:24.735197753Z",
      "auid": 1000,
      "parent_exec_id": "OjE0MjI5NzAwMDAwMDA6MTI4NjE=",
      "refcnt": 1,
      "cap": {
        "permitted": [
          "CAP_CHOWN",
          "DAC_OVERRIDE",
          "CAP_DAC_READ_SEARCH",
          "CAP_FOWNER",
          "CAP_FSETID",
          "CAP_KILL",
          "CAP_SETGID",
          "CAP_SETUID",
          "CAP_SETPCAP",
          "CAP_LINUX_IMMUTABLE",
          "CAP_NET_BIND_SERVICE",
          "CAP_NET_BROADCAST",
          "CAP_NET_ADMIN",
          "CAP_NET_RAW",
          "CAP_IPC_LOCK",
          "CAP_IPC_OWNER",
          "CAP_SYS_MODULE",
          "CAP_SYS_RAWIO",
          "CAP_SYS_CHROOT",
          "CAP_SYS_PTRACE",
          "CAP_SYS_PACCT",
          "CAP_SYS_ADMIN",
          "CAP_SYS_BOOT",
          "CAP_SYS_NICE",
          "CAP_SYS_RESOURCE",
          "CAP_SYS_TIME",
          "CAP_SYS_TTY_CONFIG",
          "CAP_MKNOD",
          "CAP_LEASE",
          "CAP_AUDIT_WRITE",
          "CAP_AUDIT_CONTROL",
          "CAP_SETFCAP",
          "CAP_MAC_OVERRIDE",
          "CAP_MAC_ADMIN",
          "CAP_SYSLOG",
          "CAP_WAKE_ALARM",
          "CAP_BLOCK_SUSPEND",
          "CAP_AUDIT_READ",
          "CAP_PERFMON",
          "CAP_BPF",
          "CAP_CHECKPOINT_RESTORE"
        ],
        "effective": [
          "CAP_CHOWN",
          "DAC_OVERRIDE",
          "CAP_DAC_READ_SEARCH",
          "CAP_FOWNER",
          "CAP_FSETID",
          "CAP_KILL",
          "CAP_SETGID",
          "CAP_SETUID",
          "CAP_SETPCAP",
          "CAP_LINUX_IMMUTABLE",
          "CAP_NET_BIND_SERVICE",
          "CAP_NET_BROADCAST",
          "CAP_NET_ADMIN",
          "CAP_NET_RAW",
          "CAP_IPC_LOCK",
          "CAP_IPC_OWNER",
          "CAP_SYS_MODULE",
          "CAP_SYS_RAWIO",
          "CAP_SYS_CHROOT",
          "CAP_SYS_PTRACE",
          "CAP_SYS_PACCT",
          "CAP_SYS_ADMIN",
          "CAP_SYS_BOOT",
          "CAP_SYS_NICE",
          "CAP_SYS_RESOURCE",
          "CAP_SYS_TIME",
          "CAP_SYS_TTY_CONFIG",
          "CAP_MKNOD",
          "CAP_LEASE",
          "CAP_AUDIT_WRITE",
          "CAP_AUDIT_CONTROL",
          "CAP_SETFCAP",
          "CAP_MAC_OVERRIDE",
          "CAP_MAC_ADMIN",
          "CAP_SYSLOG",
          "CAP_WAKE_ALARM",
          "CAP_BLOCK_SUSPEND",
          "CAP_AUDIT_READ",
          "CAP_PERFMON",
          "CAP_BPF",
          "CAP_CHECKPOINT_RESTORE"
        ]
      },
      "tid": 28955,
      "process_credentials": {
        "uid": 1000,
        "gid": 1000,
        "euid": 0,
        "egid": 1000,
        "suid": 0,
        "sgid": 1000,
        "fsuid": 0,
        "fsgid": 1000
      },
      "binary_properties": {
        "setuid": 0,
        "privileges_changed": [
          "PRIVILEGES_RAISED_EXEC_FILE_SETUID"
        ]
      }
    },

Signed-off-by: Djalal Harouni <tixxdz@gmail.com>

api: detect set-group-ID to root as privileges execution

Make Tetragon treat a binary exec with set-group-ID to root as
a privileged execution since it changes the effective group ID
to root which allows access to resources owned by group root.

Signed-off-by: Djalal Harouni <tixxdz@gmail.com>

tetragon: detect setgid execution to root as privileged execution

Execution of a binary with set-group-ID to root is handled as a
privileged execution. The PRIVILEGES_RAISED_EXEC_FILE_SETGID will be
set.

Signed-off-by: Djalal Harouni <tixxdz@gmail.com>

tests: split exec suid tests

Signed-off-by: Djalal Harouni <tixxdz@gmail.com>

tests: add suid root and setuid to non root execution tests

Signed-off-by: Djalal Harouni <tixxdz@gmail.com>

tests: detect privileged execution of file caps binaries

We use the ping binary if it is available to detect privileged
execution through file caps on binaries.

Signed-off-by: Djalal Harouni <tixxdz@gmail.com>

fix(deps): update module golang.org/x/sys to v0.16.0

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

make: Introduce install/kubernetes/Makefile

Signed-off-by: sadath-12 <sadathsadu2002@gmail.com>

workflows: bump bom version for race condition fix

We experienced cilium#1894, which made me create kubernetes-sigs/bom#385 only
to fix the race condition and realize that it was already patched but no
release existed.

Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>

tetragon: Remove early binary filter config code

Now that we do not have bpf support for early filter
et's remove the go config code for that as well.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>

tetragon: Use selidx name for selector index

The selidx is used in other parts of the code and
fits more than just generic index.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>

tracingpolicy: add a message field to inform users what is happening

Right now all fields in Tetragon events are generated from code,
the only exception is the policy_name which does not inform users
what is going on.

This optional "message" field if present will be copied and printed
in the final generated event.

We want to inform users what is really happening, why this event was
generated? it is not obvious to non technical users.

An optional abstracted could be easy to read something like:
  "message": "Kernel module being loaded"

As we want to capture new users, the policy_name is not enough and
strictly speaking it has its own purpose, using abstracted messages
or logs appeals to more users.

The down side of this is our generated events may grow in size, thus
the new "message" field will be chopped to 256 characters only.

In future if the event size is really a problem, we can start shipping
default field filters, that hide these fields from the source before
even constructing the event.

Signed-off-by: Djalal Harouni <tixxdz@gmail.com>

events: include the tracing policy message in events

If a tracing policy has the message field set, then include it
into the reported events. It is optional.

The message field is properly escaped.

Signed-off-by: Djalal Harouni <tixxdz@gmail.com>

tests: ensure that message field is reported

Signed-off-by: Djalal Harouni <tixxdz@gmail.com>

policylibrary: add message field to module tracing policy

This adds the message field to the module tracing library as an
example of what the event is about.

Signed-off-by: Djalal Harouni <tixxdz@gmail.com>

test: add message testing

Signed-off-by: Djalal Harouni <tixxdz@gmail.com>

tester-progs: compile killer-tester-32 statically

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>

vmtests: make a note when all tests skipped

Currently, if all tests are skipped we print success, which is
misleading. Print a better message.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>

vmtests: listTests now accepts a pattern

Previously, we always used "." to list tests in listTests. This patch
adds it as an argument to be used in a later patch.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>

vmtests: improve running tests from a file

This patch improves loading tests form a file.

First, it allows for having comments with lines starting with '#'.

Second, it lists the tests based on the provided user pattern instead of
using it as it is. Previously, the test name was defined by the user
pattern. This lead to inconsistent reporting. This patch fixes this
because now we report per-test results rather than per-pattern results.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>

vmtests: use a strict pattern for test execution

It is possible that a test is a substring of another. Use a strict
pattern so that we execute only the specified test.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>

tetragon-vmtests-run: support for detailed results

There are two ways to skip a test in our CI:
 1. add it to CiBlacklist in split-tetragon-gotests
 2. use t.Skip()

CI results report 1., but they do not report 2.

This means that skipped tests can go easily unnoticed.

To report tests skipped with (2), we need to parse the result files.
This patch does exactly that. It adds a new option
--enable-detailed-results. If this option is set, we set KeepAllLogs
because we need to parse all logs after the tests are finished.

The tests we skip with (1) are marked with ⏭️.
We introduce a new symbol for the tests that are skipped with (2): ⚡

Parsing the detailed results allows us to also report sub-tests (i.e.,
tests executed with t.Run()).

Here's an output example:

```
✅      pkg.sensors.tracing.Test_Kprobe_DisableEnablePolicy                                     (total:3 failed:0 skipped:0)    2.680496982s    1m33.952671893s
├─✅    Test_Kprobe_DisableEnablePolicy/sensor                                                  1.05s                           (1.05s)
├─✅    Test_Kprobe_DisableEnablePolicy/tracing-policy                                          1.09s                           (2.14s)
└─✅    Test_Kprobe_DisableEnablePolicy                                                         2.56s                           (4.7s)
⚡      pkg.sensors.tracing.TestKillerOverride32                                                (total:1 failed:0 skipped:1)    130.759191ms    1m34.083431084s
⚡      pkg.sensors.tracing.TestKillerSignal32                                                  (total:1 failed:0 skipped:1)    132.393451ms    1m34.215824535s
⚡      pkg.sensors.tracing.TestKillerOverrideBothBits                                          (total:1 failed:0 skipped:1)    119.455061ms    1m34.335279596s
⚡      pkg.sensors.tracing.TestKillerOverride                                                  (total:1 failed:0 skipped:1)    121.255148ms    1m34.456534744s
⚡      pkg.sensors.tracing.TestKillerSignal                                                    (total:1 failed:0 skipped:1)    136.665421ms    1m34.593200165s
✅      pkg.sensors.tracing.TestKillerMulti                                                     (total:1 failed:0 skipped:0)    132.761861ms    1m34.725962026s
```

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>

gh: add --enable-detailed-results to vmtests

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>

chore(deps): update all lvh-images main

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

tests: Unload policy after load

Unload the policy that was loaded during the test

Signed-off-by: sadath-12 <sadathsadu2002@gmail.com>

api: detect set-group-ID to root as privileges execution

Make Tetragon treat a binary exec with set-group-ID to root as
a privileged execution since it changes the effective group ID
to root which allows access to resources owned by group root.

Signed-off-by: Djalal Harouni <tixxdz@gmail.com>

tetragon: detect setgid execution to root as privileged execution

Execution of a binary with set-group-ID to root is handled as a
privileged execution. The PRIVILEGES_RAISED_EXEC_FILE_SETGID will be
set.

Signed-off-by: Djalal Harouni <tixxdz@gmail.com>

tests: detect privileged execution of file caps binaries

We use the ping binary if it is available to detect privileged
execution through file caps on binaries.

Signed-off-by: Djalal Harouni <tixxdz@gmail.com>

fix(deps): update module golang.org/x/sys to v0.16.0

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
sadath-12 added a commit to sadath-12/tetragon that referenced this issue Feb 10, 2024
This pr Closes cilium#1774 considering all its points and 1st part of cilium#1775

Let's break down the changes applied considering cilium#1774 point wise

1- seems to be solved by removing eventcache as it was supposed to be on user side and the user side EventCacheCount seems sufficient
2- _gauge suffix removed
3- processLru migrated towards ProcessCacheTotal as separate metric and eventcache removed as said in 1)
4- map_errors_total text rewritted
5- map_drops_total removed and in its place ProcessCacheEvicted used

Signed-off-by: sadath-12 <sadathsadu2002@gmail.com>
@lambdanis lambdanis self-assigned this Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/metrics Related to prometheus metrics area/userspace Related to userspace Tetragon logic
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants