Fix monitoring BPF maps #1774

lambdanis · 2023-11-20T17:30:59Z

Tetragon exposes metrics about BPF maps, defined in the mapmetrics package. There are a few issues with them:

Map capacity is exposed as total label in tetragon_map_in_use_gauge metric (and also, for eventcache it's set to "0"). This is against OpenMetrics conventions and makes it difficult (or impossible) to calculate the map utilization. Map capacity should be exposed as a separate metric. Alternatively, Tetragon could calculate the map utilization itself, but exposing the map capacity seems better.
The tetragon_map_in_use_gauge metric name is not very intuitive IMO and the gauge suffix is unnecessary.
processLru and eventcache are not BPF map, they're user-space caches, but info about them is exposed via mapmetrics too. This is confusing, it should be exposed as separate metrics.
tetragon_map_errors_total metric seems to have an incorrect help text.
tetragon_map_drops_total metric is incorrect. We get this from this callback. But this is not called only on evictions, it is also called when we remove elements normally, as we do here. The evictions are counted be the tetragon_errors_total{type="process_cache_evicted"} metric (defined here), so tetragon_map_drops_total can be just removed.
Metrics are not exposed for all maps: bpf maps:metrics: fix bpf maps metrics #1775

The text was updated successfully, but these errors were encountered:

sadath-12 · 2024-01-03T18:19:05Z

/assign

kkourt · 2024-01-09T12:09:05Z

@sadath-12 I've unassigned you from this issue because there are many different tasks that can happen in parallel I think. For example, @tpapagian already has a PR to address #1949. If you want to work on this, I'd suggest pick a specific bullet point and we can create a separate issue for that.

Thanks!

sadath-12 · 2024-01-09T12:11:37Z

thanks I almost covered it #1950 . I'll be detailing about them in the description

solving all monitoring issues exposed at issue cilium#1774 Signed-off-by: sadath-12 <sadathsadu2002@gmail.com>

author sadath-12 <sadathsadu2002@gmail.com> 1704801541 +0530 committer sadath-12 <sadathsadu2002@gmail.com> 1704914706 +0530 parent 8802fd6 author sadath-12 <sadathsadu2002@gmail.com> 1704801541 +0530 committer sadath-12 <sadathsadu2002@gmail.com> 1704914704 +0530 parent 8802fd6 author sadath-12 <sadathsadu2002@gmail.com> 1704801541 +0530 committer sadath-12 <sadathsadu2002@gmail.com> 1704914701 +0530 parent 8802fd6 author sadath-12 <sadathsadu2002@gmail.com> 1704801541 +0530 committer sadath-12 <sadathsadu2002@gmail.com> 1704914698 +0530 parent 8802fd6 author sadath-12 <sadathsadu2002@gmail.com> 1704801541 +0530 committer sadath-12 <sadathsadu2002@gmail.com> 1704914694 +0530 parent 8802fd6 author sadath-12 <sadathsadu2002@gmail.com> 1704801541 +0530 committer sadath-12 <sadathsadu2002@gmail.com> 1704914657 +0530 Fix: monitoring BPF maps solving all monitoring issues exposed at issue cilium#1774 Signed-off-by: sadath-12 <sadathsadu2002@gmail.com> make: Introduce install/kubernetes/Makefile Signed-off-by: sadath-12 <sadathsadu2002@gmail.com> workflows: bump bom version for race condition fix We experienced cilium#1894, which made me create kubernetes-sigs/bom#385 only to fix the race condition and realize that it was already patched but no release existed. Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com> tetragon: Remove early binary filter bpf code We can move the binary path matching early in the filter program, so it's invoked just one tailcall from where it is now. This won't make much difference wrt early binary filter invocation, so we can remove it. As a side effect we free some (~60) instructions in filter_arg program for small kernels. Signed-off-by: Jiri Olsa <jolsa@kernel.org> tetragon: Remove early binary filter config code Now that we do not have bpf support for early filter et's remove the go config code for that as well. Signed-off-by: Jiri Olsa <jolsa@kernel.org> tetragon: Split filter_tailcall_index in two values And make the index names more appropriate. Signed-off-by: Jiri Olsa <jolsa@kernel.org> tetragon: Use selidx name for selector index The selidx is used in other parts of the code and fits more than just generic index. Signed-off-by: Jiri Olsa <jolsa@kernel.org> tracingpolicy: add a message field to inform users what is happening Right now all fields in Tetragon events are generated from code, the only exception is the policy_name which does not inform users what is going on. This optional "message" field if present will be copied and printed in the final generated event. We want to inform users what is really happening, why this event was generated? it is not obvious to non technical users. An optional abstracted could be easy to read something like: "message": "Kernel module being loaded" As we want to capture new users, the policy_name is not enough and strictly speaking it has its own purpose, using abstracted messages or logs appeals to more users. The down side of this is our generated events may grow in size, thus the new "message" field will be chopped to 256 characters only. In future if the event size is really a problem, we can start shipping default field filters, that hide these fields from the source before even constructing the event. Signed-off-by: Djalal Harouni <tixxdz@gmail.com> api:proto: add short message to print Tracing Policy message Signed-off-by: Djalal Harouni <tixxdz@gmail.com> events: include the tracing policy message in events If a tracing policy has the message field set, then include it into the reported events. It is optional. The message field is properly escaped. Signed-off-by: Djalal Harouni <tixxdz@gmail.com> tests: ensure that message field is reported Signed-off-by: Djalal Harouni <tixxdz@gmail.com> policylibrary: add message field to module tracing policy This adds the message field to the module tracing library as an example of what the event is about. Signed-off-by: Djalal Harouni <tixxdz@gmail.com> test: add message testing Signed-off-by: Djalal Harouni <tixxdz@gmail.com> metrics: Do not return when we cannot find a _stats map In Tetragon, we have maps ending with _stats that contain metadata about the main maps. An example is that we have execve_map and execve_map_stats that contains errors and other information related to execve_map map. Now when we fail to open the _stats map, we just stop the loop and we can possibly miss to check for other maps. This patch change that and tries for the next map. Additionally we do not try to collect stats for maps that already end up with _stats as this would result in _stats_stats map names. Signed-off-by: Anastasios Papagiannis <tasos.papagiannnis@gmail.com> fix(deps): update module golang.org/x/sync to v0.6.0 Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> tester-progs: compile killer-tester-32 statically Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com> vmtests: make a note when all tests skipped Currently, if all tests are skipped we print success, which is misleading. Print a better message. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com> vmtests: listTests now accepts a pattern Previously, we always used "." to list tests in listTests. This patch adds it as an argument to be used in a later patch. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com> vmtests: improve running tests from a file This patch improves loading tests form a file. First, it allows for having comments with lines starting with '#'. Second, it lists the tests based on the provided user pattern instead of using it as it is. Previously, the test name was defined by the user pattern. This lead to inconsistent reporting. This patch fixes this because now we report per-test results rather than per-pattern results. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com> vmtests: use a strict pattern for test execution It is possible that a test is a substring of another. Use a strict pattern so that we execute only the specified test. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com> tetragon-vmtests-run: support for detailed results There are two ways to skip a test in our CI: 1. add it to CiBlacklist in split-tetragon-gotests 2. use t.Skip() CI results report 1., but they do not report 2. This means that skipped tests can go easily unnoticed. To report tests skipped with (2), we need to parse the result files. This patch does exactly that. It adds a new option --enable-detailed-results. If this option is set, we set KeepAllLogs because we need to parse all logs after the tests are finished. The tests we skip with (1) are marked with ⏭️. We introduce a new symbol for the tests that are skipped with (2): ⚡ Parsing the detailed results allows us to also report sub-tests (i.e., tests executed with t.Run()). Here's an output example: ``` ✅ pkg.sensors.tracing.Test_Kprobe_DisableEnablePolicy (total:3 failed:0 skipped:0) 2.680496982s 1m33.952671893s ├─✅ Test_Kprobe_DisableEnablePolicy/sensor 1.05s (1.05s) ├─✅ Test_Kprobe_DisableEnablePolicy/tracing-policy 1.09s (2.14s) └─✅ Test_Kprobe_DisableEnablePolicy 2.56s (4.7s) ⚡ pkg.sensors.tracing.TestKillerOverride32 (total:1 failed:0 skipped:1) 130.759191ms 1m34.083431084s ⚡ pkg.sensors.tracing.TestKillerSignal32 (total:1 failed:0 skipped:1) 132.393451ms 1m34.215824535s ⚡ pkg.sensors.tracing.TestKillerOverrideBothBits (total:1 failed:0 skipped:1) 119.455061ms 1m34.335279596s ⚡ pkg.sensors.tracing.TestKillerOverride (total:1 failed:0 skipped:1) 121.255148ms 1m34.456534744s ⚡ pkg.sensors.tracing.TestKillerSignal (total:1 failed:0 skipped:1) 136.665421ms 1m34.593200165s ✅ pkg.sensors.tracing.TestKillerMulti (total:1 failed:0 skipped:0) 132.761861ms 1m34.725962026s ``` Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com> gh: add --enable-detailed-results to vmtests Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com> chore(deps): update all lvh-images main Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> tests: Unload policy after load Unload the policy that was loaded during the test Signed-off-by: sadath-12 <sadathsadu2002@gmail.com> api: detect binary execution that raise process privileges Add ProcessPrivilegesChanged to note reasons why the process privileges changed, and start using it in the binary_properties to indicate why the binary execution changed or raised privileges. For now we handle only elevating privileges, in future depending on user's requests we may do privilege changes or dropping. The ProcessPrivilegesChanged is an enum able to handle different values. The new `binary_properties.privileges_changed` contains the reasons why this binary execution gained new capabilities or elevated its privileges. Usually this happens due: 1. File capability sets on the binary. 2. set-user-ID to root bit is set on the binary. Details of ProcessPrivilegesChanged: 1. PRIVILEGES_RAISED_EXEC_FILE_CAP - The File capability sets on binary: The kernel supports associating capability sets with an executable file using `setcap` command. The file capability sets are stored in an extended attribute named `security.capability`. The file capability sets, in conjunction with the capability sets of the process, determine the process capabilities and privileges after the `execve` system call. For further reference, please check sections `File capability extended attribute versioning` and `Namespaced file capabilities` of the capabilities man pages: https://man7.org/linux/man-pages/man7/capabilities.7.html. 2. PRIVILEGES_RAISED_EXEC_FILE_SETUID - set-user-ID to root bit is set on the binary: When a process with nonzero UIDs executes a binary with a set-user-ID to root also known as suid-root executable, then the kernel switches the effective user ID to 0 (root) which is a privilege elevation operation since it grants access to resources owned by the root user. The effective user ID is listed inside the `process_credentials` part of the `process` object. For further reading, section `Capabilities and execution of programs by root` of https://man7.org/linux/man-pages/man7/capabilities.7.html. Afterward the kernel recalculates the capability sets of the process and grants all capabilities in the permitted and effective capability sets, except those masked out by the capability bounding set. If the binary also have file capability sets then these bits are honored and the process gains just the capabilities granted by the file capability sets (i.e., not all capabilities, as it would occur when executing a set-user-ID to root binary that does not have any associated file capabilities). This is described in section `Set-user-ID-root programs that have file capabilities` of https://man7.org/linux/man-pages/man7/capabilities.7.html. The new granted capabilities can be listed inside the `process` object. There is one exception for the special treatments of set-user-ID to root execution receiving all capabilities, if the `SecBitNoRoot` bit of the Secure bits is set, then the kernel does not grant any capability. The secure bits are already exposed part of the `process_credentials` object. Please check section: `The securebits flags: establishing a capabilities-only environment` of the capabilities man pages: https://man7.org/linux/man-pages/man7/capabilities.7.html The new privileges_changed contains the reasons, no need to encode again the full capabilities there, let's just indicate to the user and if they are interested they can inspect other fields that contain the permitted and effective capabilities of the current execution. *Limitations:* 1. We do not handle the case of an unprivileged user id that is _mapped_ to user id root 0 inside its own namespace then performs a setuid to root execution. Current eBPF infrastructure does not support user id translation into target user namespace. Plus current Tetragon design deals with global uids, so it is not clear at this moment how to handle this corner case. For this the binary_properties.privileges_changed is an array that can hold extra values in the future. 2. We do not handle the case of a binary having both setuid to root bit set and file capability sets on the binary. Right now we report in the `privileges_changed` only `PRIVILEGES_RAISED_EXEC_FILE_SETUID` where we should also add `PRIVILEGES_RAISED_EXEC_FILE_CAP`. This is left for the future as we have to read the `security.capability` field of the extended attributes of the binary and parse the capabilities that are actually set to ensure that new capabilities received are not only due to the set-user-ID root exection. Signed-off-by: Djalal Harouni <tixxdz@gmail.com> tetragon: detect binary execution that raise process privileges From a higher level: Populate binary_properties.privileges_changed with the the reasons why this binary execution gained new capabilities or elevated its privileges. Usually this happens due: 1. File capability sets on the binary. 2. set-user-ID to root bit is set on the binary. Details of ProcessPrivilegesChanged: 1. PRIVILEGES_RAISED_EXEC_FILE_CAP - The File capability sets on binary: The kernel supports associating capability sets with an executable file using `setcap` command. The file capability sets are stored in an extended attribute named `security.capability`. The file capability sets, in conjunction with the capability sets of the process, determine the process capabilities and privileges after the `execve` system call. For further reference, please check sections `File capability extended attribute versioning` and `Namespaced file capabilities` of the capabilities man pages: https://man7.org/linux/man-pages/man7/capabilities.7.html. 2. PRIVILEGES_RAISED_EXEC_FILE_SETUID - set-user-ID to root bit is set on the binary: When a process with nonzero UIDs executes a binary with a set-user-ID to root also known as suid-root executable, then the kernel switches the effective user ID to 0 (root) which is a privilege elevation operation since it grants access to resources owned by the root user. The effective user ID is listed inside the `process_credentials` part of the `process` object. For further reading, section `Capabilities and execution of programs by root` of https://man7.org/linux/man-pages/man7/capabilities.7.html. Afterward the kernel recalculates the capability sets of the process and grants all capabilities in the permitted and effective capability sets, except those masked out by the capability bounding set. If the binary also have file capability sets then these bits are honored and the process gains just the capabilities granted by the file capability sets (i.e., not all capabilities, as it would occur when executing a set-user-ID to root binary that does not have any associated file capabilities). This is described in section `Set-user-ID-root programs that have file capabilities` of https://man7.org/linux/man-pages/man7/capabilities.7.html. The new granted capabilities can be listed inside the `process` object. There is one exception for the special treatments of set-user-ID to root execution receiving all capabilities, if the `SecBitNoRoot` bit of the Secure bits is set, then the kernel does not grant any capability. The secure bits are already exposed part of the `process_credentials` object. Please check section: `The securebits flags: establishing a capabilities-only environment` of the capabilities man pages: https://man7.org/linux/man-pages/man7/capabilities.7.html The new privileges_changed contains the reasons, no need to encode again the full capabilities there, let's just indicate to the user and if they are interested they can inspect other fields that contain the permitted and effective capabilities of the current execution. Implementation: The capabilities are re-calculated by the capability LSM during execve, so let's avoid trying to be smart and calculate or guess them... we just display the final results. To display these fields, Tetragon must be run with enable-process-cred. 1. If the setuid bit is set we check if it is a privileged root and if it's real uid is not global root and effective uid is global root, if so then this a suid binary execution which is already enough for Tetragon to report as a privilege change execution. 2. Check if the setuid bit is not set and the new permitted capability sets is higher then the previous one then is is a file capability sets execution. *Limitations:* 1. We do not handle the case of an unprivileged user id that is _mapped_ to user id root 0 inside its own namespace then performs a setuid to root execution. Current eBPF infrastructure does not support user id translation into target user namespace. Plus current Tetragon design deals with global uids, so it is not clear at this moment how to handle this corner case. For this the binary_properties.privileges_changed is an array that can hold extra values in the future. 2. We do not handle the case of a binary having both setuid to root bit set and file capability sets on the binary. Right now we report in the `privileges_changed` only `PRIVILEGES_RAISED_EXEC_FILE_SETUID` where we should also add `PRIVILEGES_RAISED_EXEC_FILE_CAP`. This is left for the future as we have to read the `security.capability` field of the extended attributes of the binary and parse the capabilities that are actually set to ensure that new capabilities received are not only due to the set-user-ID root exection. Example output event: File capabilities execution: "process_exec": { "process": { "exec_id": "OjIwMTAxOTMxNzAzMjc4OjE0MDUwOA==", "pid": 140508, "uid": 1000, "cwd": "/home/tixxdz/work/station/code/src/github.com/tixxdz/tetragon", "binary": "/usr/bin/ping", "arguments": "ebpf.io", "flags": "execve clone", "start_time": "2023-11-23T14:32:53.227623175Z", "auid": 1000, "parent_exec_id": "OjI5NjYzNzAwMDAwMDA6NDQ4NTk=", "refcnt": 1, "cap": { "permitted": [ "CAP_NET_RAW" ], "effective": [ "CAP_NET_RAW" ] }, "tid": 140508, "process_credentials": { "uid": 1000, "gid": 1000, "euid": 1000, "egid": 1000, "suid": 1000, "sgid": 1000, "fsuid": 1000, "fsgid": 1000 }, "binary_properties": { "privileges_changed": [ "PRIVILEGES_RAISED_EXEC_FILE_CAP" ] } }, Setuid suid execution (from non uid 0 -> to global uid 0 in root user namespace): "process_exec": { "process": { "exec_id": "OjMyMTg1ODY1MDcxNTQ6Mjg5NTU=", "pid": 28955, "uid": 1000, "cwd": "/home/tixxdz/work/station/code/src/github.com/tixxdz/tetragon", "binary": "/usr/bin/su", "arguments": "--help", "flags": "execve clone", "start_time": "2023-12-11T14:59:24.735197753Z", "auid": 1000, "parent_exec_id": "OjE0MjI5NzAwMDAwMDA6MTI4NjE=", "refcnt": 1, "cap": { "permitted": [ "CAP_CHOWN", "DAC_OVERRIDE", "CAP_DAC_READ_SEARCH", "CAP_FOWNER", "CAP_FSETID", "CAP_KILL", "CAP_SETGID", "CAP_SETUID", "CAP_SETPCAP", "CAP_LINUX_IMMUTABLE", "CAP_NET_BIND_SERVICE", "CAP_NET_BROADCAST", "CAP_NET_ADMIN", "CAP_NET_RAW", "CAP_IPC_LOCK", "CAP_IPC_OWNER", "CAP_SYS_MODULE", "CAP_SYS_RAWIO", "CAP_SYS_CHROOT", "CAP_SYS_PTRACE", "CAP_SYS_PACCT", "CAP_SYS_ADMIN", "CAP_SYS_BOOT", "CAP_SYS_NICE", "CAP_SYS_RESOURCE", "CAP_SYS_TIME", "CAP_SYS_TTY_CONFIG", "CAP_MKNOD", "CAP_LEASE", "CAP_AUDIT_WRITE", "CAP_AUDIT_CONTROL", "CAP_SETFCAP", "CAP_MAC_OVERRIDE", "CAP_MAC_ADMIN", "CAP_SYSLOG", "CAP_WAKE_ALARM", "CAP_BLOCK_SUSPEND", "CAP_AUDIT_READ", "CAP_PERFMON", "CAP_BPF", "CAP_CHECKPOINT_RESTORE" ], "effective": [ "CAP_CHOWN", "DAC_OVERRIDE", "CAP_DAC_READ_SEARCH", "CAP_FOWNER", "CAP_FSETID", "CAP_KILL", "CAP_SETGID", "CAP_SETUID", "CAP_SETPCAP", "CAP_LINUX_IMMUTABLE", "CAP_NET_BIND_SERVICE", "CAP_NET_BROADCAST", "CAP_NET_ADMIN", "CAP_NET_RAW", "CAP_IPC_LOCK", "CAP_IPC_OWNER", "CAP_SYS_MODULE", "CAP_SYS_RAWIO", "CAP_SYS_CHROOT", "CAP_SYS_PTRACE", "CAP_SYS_PACCT", "CAP_SYS_ADMIN", "CAP_SYS_BOOT", "CAP_SYS_NICE", "CAP_SYS_RESOURCE", "CAP_SYS_TIME", "CAP_SYS_TTY_CONFIG", "CAP_MKNOD", "CAP_LEASE", "CAP_AUDIT_WRITE", "CAP_AUDIT_CONTROL", "CAP_SETFCAP", "CAP_MAC_OVERRIDE", "CAP_MAC_ADMIN", "CAP_SYSLOG", "CAP_WAKE_ALARM", "CAP_BLOCK_SUSPEND", "CAP_AUDIT_READ", "CAP_PERFMON", "CAP_BPF", "CAP_CHECKPOINT_RESTORE" ] }, "tid": 28955, "process_credentials": { "uid": 1000, "gid": 1000, "euid": 0, "egid": 1000, "suid": 0, "sgid": 1000, "fsuid": 0, "fsgid": 1000 }, "binary_properties": { "setuid": 0, "privileges_changed": [ "PRIVILEGES_RAISED_EXEC_FILE_SETUID" ] } }, Signed-off-by: Djalal Harouni <tixxdz@gmail.com> api: detect set-group-ID to root as privileges execution Make Tetragon treat a binary exec with set-group-ID to root as a privileged execution since it changes the effective group ID to root which allows access to resources owned by group root. Signed-off-by: Djalal Harouni <tixxdz@gmail.com> tetragon: detect setgid execution to root as privileged execution Execution of a binary with set-group-ID to root is handled as a privileged execution. The PRIVILEGES_RAISED_EXEC_FILE_SETGID will be set. Signed-off-by: Djalal Harouni <tixxdz@gmail.com> tests: split exec suid tests Signed-off-by: Djalal Harouni <tixxdz@gmail.com> tests: add suid root and setuid to non root execution tests Signed-off-by: Djalal Harouni <tixxdz@gmail.com> tests: detect privileged execution of file caps binaries We use the ping binary if it is available to detect privileged execution through file caps on binaries. Signed-off-by: Djalal Harouni <tixxdz@gmail.com> fix(deps): update module golang.org/x/sys to v0.16.0 Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> make: Introduce install/kubernetes/Makefile Signed-off-by: sadath-12 <sadathsadu2002@gmail.com> workflows: bump bom version for race condition fix We experienced cilium#1894, which made me create kubernetes-sigs/bom#385 only to fix the race condition and realize that it was already patched but no release existed. Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com> tetragon: Remove early binary filter config code Now that we do not have bpf support for early filter et's remove the go config code for that as well. Signed-off-by: Jiri Olsa <jolsa@kernel.org> tetragon: Use selidx name for selector index The selidx is used in other parts of the code and fits more than just generic index. Signed-off-by: Jiri Olsa <jolsa@kernel.org> tracingpolicy: add a message field to inform users what is happening Right now all fields in Tetragon events are generated from code, the only exception is the policy_name which does not inform users what is going on. This optional "message" field if present will be copied and printed in the final generated event. We want to inform users what is really happening, why this event was generated? it is not obvious to non technical users. An optional abstracted could be easy to read something like: "message": "Kernel module being loaded" As we want to capture new users, the policy_name is not enough and strictly speaking it has its own purpose, using abstracted messages or logs appeals to more users. The down side of this is our generated events may grow in size, thus the new "message" field will be chopped to 256 characters only. In future if the event size is really a problem, we can start shipping default field filters, that hide these fields from the source before even constructing the event. Signed-off-by: Djalal Harouni <tixxdz@gmail.com> events: include the tracing policy message in events If a tracing policy has the message field set, then include it into the reported events. It is optional. The message field is properly escaped. Signed-off-by: Djalal Harouni <tixxdz@gmail.com> tests: ensure that message field is reported Signed-off-by: Djalal Harouni <tixxdz@gmail.com> policylibrary: add message field to module tracing policy This adds the message field to the module tracing library as an example of what the event is about. Signed-off-by: Djalal Harouni <tixxdz@gmail.com> test: add message testing Signed-off-by: Djalal Harouni <tixxdz@gmail.com> tester-progs: compile killer-tester-32 statically Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com> vmtests: make a note when all tests skipped Currently, if all tests are skipped we print success, which is misleading. Print a better message. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com> vmtests: listTests now accepts a pattern Previously, we always used "." to list tests in listTests. This patch adds it as an argument to be used in a later patch. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com> vmtests: improve running tests from a file This patch improves loading tests form a file. First, it allows for having comments with lines starting with '#'. Second, it lists the tests based on the provided user pattern instead of using it as it is. Previously, the test name was defined by the user pattern. This lead to inconsistent reporting. This patch fixes this because now we report per-test results rather than per-pattern results. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com> vmtests: use a strict pattern for test execution It is possible that a test is a substring of another. Use a strict pattern so that we execute only the specified test. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com> tetragon-vmtests-run: support for detailed results There are two ways to skip a test in our CI: 1. add it to CiBlacklist in split-tetragon-gotests 2. use t.Skip() CI results report 1., but they do not report 2. This means that skipped tests can go easily unnoticed. To report tests skipped with (2), we need to parse the result files. This patch does exactly that. It adds a new option --enable-detailed-results. If this option is set, we set KeepAllLogs because we need to parse all logs after the tests are finished. The tests we skip with (1) are marked with ⏭️. We introduce a new symbol for the tests that are skipped with (2): ⚡ Parsing the detailed results allows us to also report sub-tests (i.e., tests executed with t.Run()). Here's an output example: ``` ✅ pkg.sensors.tracing.Test_Kprobe_DisableEnablePolicy (total:3 failed:0 skipped:0) 2.680496982s 1m33.952671893s ├─✅ Test_Kprobe_DisableEnablePolicy/sensor 1.05s (1.05s) ├─✅ Test_Kprobe_DisableEnablePolicy/tracing-policy 1.09s (2.14s) └─✅ Test_Kprobe_DisableEnablePolicy 2.56s (4.7s) ⚡ pkg.sensors.tracing.TestKillerOverride32 (total:1 failed:0 skipped:1) 130.759191ms 1m34.083431084s ⚡ pkg.sensors.tracing.TestKillerSignal32 (total:1 failed:0 skipped:1) 132.393451ms 1m34.215824535s ⚡ pkg.sensors.tracing.TestKillerOverrideBothBits (total:1 failed:0 skipped:1) 119.455061ms 1m34.335279596s ⚡ pkg.sensors.tracing.TestKillerOverride (total:1 failed:0 skipped:1) 121.255148ms 1m34.456534744s ⚡ pkg.sensors.tracing.TestKillerSignal (total:1 failed:0 skipped:1) 136.665421ms 1m34.593200165s ✅ pkg.sensors.tracing.TestKillerMulti (total:1 failed:0 skipped:0) 132.761861ms 1m34.725962026s ``` Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com> gh: add --enable-detailed-results to vmtests Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com> chore(deps): update all lvh-images main Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> tests: Unload policy after load Unload the policy that was loaded during the test Signed-off-by: sadath-12 <sadathsadu2002@gmail.com> api: detect set-group-ID to root as privileges execution Make Tetragon treat a binary exec with set-group-ID to root as a privileged execution since it changes the effective group ID to root which allows access to resources owned by group root. Signed-off-by: Djalal Harouni <tixxdz@gmail.com> tetragon: detect setgid execution to root as privileged execution Execution of a binary with set-group-ID to root is handled as a privileged execution. The PRIVILEGES_RAISED_EXEC_FILE_SETGID will be set. Signed-off-by: Djalal Harouni <tixxdz@gmail.com> tests: detect privileged execution of file caps binaries We use the ping binary if it is available to detect privileged execution through file caps on binaries. Signed-off-by: Djalal Harouni <tixxdz@gmail.com> fix(deps): update module golang.org/x/sys to v0.16.0 Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

This pr Closes cilium#1774 considering all its points and 1st part of cilium#1775 Let's break down the changes applied considering cilium#1774 point wise 1- seems to be solved by removing eventcache as it was supposed to be on user side and the user side EventCacheCount seems sufficient 2- _gauge suffix removed 3- processLru migrated towards ProcessCacheTotal as separate metric and eventcache removed as said in 1) 4- map_errors_total text rewritted 5- map_drops_total removed and in its place ProcessCacheEvicted used Signed-off-by: sadath-12 <sadathsadu2002@gmail.com>

lambdanis added the area/metrics Related to prometheus metrics label Nov 20, 2023

lambdanis changed the title ~~Map monitoring improvements~~ Improve maps monitoring Nov 20, 2023

lambdanis added the area/userspace Related to userspace Tetragon logic label Nov 21, 2023

lambdanis changed the title ~~Improve maps monitoring~~ Fix monitoring BPF maps Dec 12, 2023

lambdanis assigned sadath-12 Jan 3, 2024

kkourt unassigned sadath-12 Jan 9, 2024

sadath-12 mentioned this issue Jan 9, 2024

metrics: Improve monitoring BPF maps and userspace caches #1950

Merged

sadath-12 added a commit to sadath-12/tetragon that referenced this issue Jan 9, 2024

Fix: monitoring BPF maps

f3d1676

solving all monitoring issues exposed at issue cilium#1774 Signed-off-by: sadath-12 <sadathsadu2002@gmail.com>

sadath-12 added a commit to sadath-12/tetragon that referenced this issue Jan 9, 2024

Fix: monitoring BPF maps

3fa3d38

solving all monitoring issues exposed at issue cilium#1774 Signed-off-by: sadath-12 <sadathsadu2002@gmail.com>

sadath-12 added a commit to sadath-12/tetragon that referenced this issue Jan 9, 2024

Fix: monitoring BPF maps

dce5732

solving all monitoring issues exposed at issue cilium#1774 Signed-off-by: sadath-12 <sadathsadu2002@gmail.com>

lambdanis self-assigned this Mar 1, 2024

lambdanis mentioned this issue Apr 3, 2024

Improve metrics for map and cache sizes #2291

Merged

lambdanis closed this as completed in #2291 Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix monitoring BPF maps #1774

Fix monitoring BPF maps #1774

lambdanis commented Nov 20, 2023 •

edited

Loading

sadath-12 commented Jan 3, 2024

kkourt commented Jan 9, 2024

sadath-12 commented Jan 9, 2024

Fix monitoring BPF maps #1774

Fix monitoring BPF maps #1774

Comments

lambdanis commented Nov 20, 2023 • edited Loading

sadath-12 commented Jan 3, 2024

kkourt commented Jan 9, 2024

sadath-12 commented Jan 9, 2024

lambdanis commented Nov 20, 2023 •

edited

Loading