feat: k3s pod dashboard #60

Reanmachine · 2024-09-30T05:50:57Z

Following #57, this PR adds dashboards for the k3s metrics produced by TrueNAS' metrics exporter. This dashboard gives an overview of the k3s cluster as whole and a collapsible and repeatable section for each pod.

The cpu metric was identified to be instantaneous cpu time in ns for a given second. This makes the metric a bit tricky to work with as it does not play nice with graphana/prometheus' rate intervals, but each value can be computed on the whole by dividing it by 1bn.

This dashboard gives an overview of the k3s cluster as whole and a collapsable and repeatable section for each pod. The cpu metric was identified to be instantaneous cpu time in ns for a given second. This makes the metric a bit tricky to work with as it does not play nice with graphana/prometheus' rate intervals, but each value can be computed on the whole by dividing it by 1bn.

Reanmachine · 2024-09-30T05:59:06Z

dashboards/truenas_scale_applications_k3s.json

+          },
+          "editorMode": "code",
+          "exemplar": false,
+          "expr": "k3s_pod_cpu{instance=~\"$instance\"} / 1000000000",


@Supporterino For your consideration. The POD CPU does seem to be in ns, but it seems to be an instantaneous measure for the current second at the time of submission.

Since it's a gauge and not a counter, we can't collect the changes over the graphana $__rate_interval so instead we're just dividing the value by the number of ns in a second to get the instantaneous cpu % usage.

I don't have a huge variety of workloads to press this value locally, but I did upload a bunch of pictures to immich to trigger the ML container and was able to capture examples of the cpu hitting about 80%.

This seems to be the most clear & reasonable measure from what I've seen.

This change makes the variable values refresh on time range change so old pods don't show up anymore.

Supporterino · 2024-10-04T12:32:24Z

@Reanmachine dashboard Looks good for me. Just one thing could you rename the CPU graphs to usage since you are converting it to that

This fix standardizes the names to `cpu usage` as that's the measurement we're showing. Also noticed the cpu gague had the old calculation so aligned it with the others and added the truenas tag.

Supporterino · 2024-10-04T17:43:10Z

LGTM. Ty for your Submission

Reanmachine force-pushed the feat-k3s-dashboard branch from 0594833 to 7937d85 Compare September 30, 2024 05:58

Reanmachine commented Sep 30, 2024

View reviewed changes

fix: tweaked the variable refresh for k3s dash

5beefde

This change makes the variable values refresh on time range change so old pods don't show up anymore.

fix: standardized on 'cpu usage' for pod cpu in dashboard

d4e661b

This fix standardizes the names to `cpu usage` as that's the measurement we're showing. Also noticed the cpu gague had the old calculation so aligned it with the others and added the truenas tag.

Supporterino approved these changes Oct 4, 2024

View reviewed changes

Supporterino merged commit 6c205a3 into Supporterino:main Oct 4, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: k3s pod dashboard #60

feat: k3s pod dashboard #60

Reanmachine commented Sep 30, 2024

Reanmachine Sep 30, 2024

Supporterino commented Oct 4, 2024

Supporterino commented Oct 4, 2024

feat: k3s pod dashboard #60

feat: k3s pod dashboard #60

Conversation

Reanmachine commented Sep 30, 2024

Reanmachine Sep 30, 2024

Choose a reason for hiding this comment

Supporterino commented Oct 4, 2024

Supporterino commented Oct 4, 2024