Enable sysctl collector. #5285

ryanlovett · 2023-12-18T21:39:45Z

Also track tcp_mem.

shaneknapp · 2023-12-18T21:50:32Z

so.... i don't think this works as you might expect. ;)

tcp_mem values are static, and are just the limits that's set by the kernel at boot. the ACTUAL amount of memory being used is stored in /proc/net/sockstat. to whit:

gke-fall-2019-core-2023-07-11-421ddf34-zwzm ~ # sysctl -a | grep tcp_mem
net.ipv4.tcp_mem = 383673       511565  767346
gke-fall-2019-core-2023-07-11-421ddf34-zwzm ~ # cat /proc/net/sockstat
sockets: used 1092
TCP: inuse 76 orphan 1 tw 3081 alloc 1239 mem 153488  <--- this is the amount of tcp_mem in use at this moment in time
UDP: inuse 3 mem 2
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0

this means that we'd just be monitoring values that never change. :)

ryanlovett · 2023-12-18T21:53:41Z

This won't help us track usage, but it will help us keep track of how our nodes are configured.

felder · 2024-10-11T00:27:02Z

Now that we know this is an issue with ephemeral ports, do we still need this? Can the work done here be used to track and send info to prometheus regarding the number of in use ephemeral ports going to hub:8081 from the chp?

ryanlovett · 2024-10-11T17:42:45Z

I don't think the sysctl collector will help, although sysctl can report on the ephemeral port range with sysctl net.ipv4.ip_local_port_range. We don't need to be constantly changing this so we don't need to collect the info.

But I think it does make sense to tracking the number of ports in use. You'd need to use prometheus-node-exporter's textfile collector, e.g.

#!/bin/bash

# Define the ephemeral port range (default Linux is 32768-60999)
read low high < /proc/sys/net/ipv4/ip_local_port_range

# Count the number of open TCP connections in the ephemeral port range
used_ports=$(ss -tan | awk -v low=$low -v high=$high '{split($4, a, ":"); port=a[length(a)]; if (port >= low && port <= high) print port}' | wc -l)

# Calculate total possible ephemeral ports
total_ports=$((high - low))

# Create a Prometheus-compatible metric output
echo "ephemeral_ports_in_use $used_ports"
echo "ephemeral_ports_total $total_ports"

On a non-cluster node I would run the script above and write to /var/lib/prometheus/node-exporter/ephemeral_ports.prom.

In the cluster we have the prometheus helm chart installed in the support namespace, so we are running the support-prometheus-node-exporter daemonset there. So I think we'd need to hook into the config in support/values.yaml to enable textfile collector and/or run this script as part of the daemonset.

In any case, this git issue can be renamed to track this, or you can close it and open a new one.

ryanlovett · 2024-10-11T17:50:33Z

Of course, I think you had a script that was more specific in terms of tracking the proxy ports, but you get the gist.

shaneknapp · 2024-10-11T17:54:48Z

I don't think the sysctl collector will help, although sysctl can report on the ephemeral port range with sysctl net.ipv4.ip_local_port_range. We don't need to be constantly changing this so we don't need to collect the info.

yep, this exactly. we need to report the dynamic number of ephemeral ports in use and ryan's quick bash script is a great start.

#!/bin/bash

# Define the ephemeral port range (default Linux is 32768-60999)
read low high < /proc/sys/net/ipv4/ip_local_port_range

# Count the number of open TCP connections in the ephemeral port range
used_ports=$(ss -tan | awk -v low=$low -v high=$high '{split($4, a, ":"); port=a[length(a)]; if (port >= low && port <= high) print port}' | wc -l)

sadly, ss isn't installed by default on our nodes so we'll need to use netstat. we'll also probably need to put this in an upstream PR so that it gets included in the chp proxy image as i'm not sure otherwise how'd we'd provision these pods with an additional script.

In any case, this git issue can be renamed to track this, or you can close it and open a new one.

i'm down for either option. if possible, i'd prefer to keep this mainly as a github issue as this is something i'd like the jupyter devs to be able to see/chime in w/ideas etc.

felder · 2024-10-11T17:55:06Z

@shaneknapp what do you think? Close this for now?

shaneknapp · 2024-10-11T17:56:13Z

@shaneknapp what do you think? Close this for now?

yeah, let's close this for now. in fact, i will do it myself! XD

felder · 2024-10-11T18:04:01Z

Opened https://jira-secure.berkeley.edu/browse/DH-396 and mentioned this issue because I want to preserve Ryan's comments here.

ryanlovett · 2024-10-11T18:04:05Z

Re:

sadly, ss isn't installed by default on our nodes so we'll need to use netstat.

If you revisit this, you can also add an extraContainer to the prometheus.nodeExporter config that pulls in an image containing whatever executables you need, sets volume mounts to access the textfile location, and runs the script.

Enable sysctl collector.

b30daf4

Also track tcp_mem.

github-actions bot added configuration helm-config support-deployment labels Dec 18, 2023

ryanlovett requested a review from shaneknapp December 18, 2023 21:41

shaneknapp approved these changes Jul 8, 2024

View reviewed changes

shaneknapp closed this Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable sysctl collector. #5285

Enable sysctl collector. #5285

ryanlovett commented Dec 18, 2023

shaneknapp commented Dec 18, 2023

ryanlovett commented Dec 18, 2023

felder commented Oct 11, 2024

ryanlovett commented Oct 11, 2024

ryanlovett commented Oct 11, 2024

shaneknapp commented Oct 11, 2024 •

edited

Loading

felder commented Oct 11, 2024

shaneknapp commented Oct 11, 2024

felder commented Oct 11, 2024

ryanlovett commented Oct 11, 2024

Enable sysctl collector. #5285

Enable sysctl collector. #5285

Conversation

ryanlovett commented Dec 18, 2023

shaneknapp commented Dec 18, 2023

ryanlovett commented Dec 18, 2023

felder commented Oct 11, 2024

ryanlovett commented Oct 11, 2024

ryanlovett commented Oct 11, 2024

shaneknapp commented Oct 11, 2024 • edited Loading

felder commented Oct 11, 2024

shaneknapp commented Oct 11, 2024

felder commented Oct 11, 2024

ryanlovett commented Oct 11, 2024

shaneknapp commented Oct 11, 2024 •

edited

Loading