Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable sysctl collector. #5285

Closed

Conversation

ryanlovett
Copy link
Collaborator

Also track tcp_mem.

Also track tcp_mem.
@shaneknapp
Copy link
Contributor

so.... i don't think this works as you might expect. ;)

tcp_mem values are static, and are just the limits that's set by the kernel at boot. the ACTUAL amount of memory being used is stored in /proc/net/sockstat. to whit:

gke-fall-2019-core-2023-07-11-421ddf34-zwzm ~ # sysctl -a | grep tcp_mem
net.ipv4.tcp_mem = 383673       511565  767346
gke-fall-2019-core-2023-07-11-421ddf34-zwzm ~ # cat /proc/net/sockstat
sockets: used 1092
TCP: inuse 76 orphan 1 tw 3081 alloc 1239 mem 153488  <--- this is the amount of tcp_mem in use at this moment in time
UDP: inuse 3 mem 2
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0

this means that we'd just be monitoring values that never change. :)

@ryanlovett
Copy link
Collaborator Author

This won't help us track usage, but it will help us keep track of how our nodes are configured.

@felder
Copy link
Contributor

felder commented Oct 11, 2024

Now that we know this is an issue with ephemeral ports, do we still need this? Can the work done here be used to track and send info to prometheus regarding the number of in use ephemeral ports going to hub:8081 from the chp?

@ryanlovett
Copy link
Collaborator Author

I don't think the sysctl collector will help, although sysctl can report on the ephemeral port range with sysctl net.ipv4.ip_local_port_range. We don't need to be constantly changing this so we don't need to collect the info.

But I think it does make sense to tracking the number of ports in use. You'd need to use prometheus-node-exporter's textfile collector, e.g.

#!/bin/bash

# Define the ephemeral port range (default Linux is 32768-60999)
read low high < /proc/sys/net/ipv4/ip_local_port_range

# Count the number of open TCP connections in the ephemeral port range
used_ports=$(ss -tan | awk -v low=$low -v high=$high '{split($4, a, ":"); port=a[length(a)]; if (port >= low && port <= high) print port}' | wc -l)

# Calculate total possible ephemeral ports
total_ports=$((high - low))

# Create a Prometheus-compatible metric output
echo "ephemeral_ports_in_use $used_ports"
echo "ephemeral_ports_total $total_ports"

On a non-cluster node I would run the script above and write to /var/lib/prometheus/node-exporter/ephemeral_ports.prom.

In the cluster we have the prometheus helm chart installed in the support namespace, so we are running the support-prometheus-node-exporter daemonset there. So I think we'd need to hook into the config in support/values.yaml to enable textfile collector and/or run this script as part of the daemonset.

In any case, this git issue can be renamed to track this, or you can close it and open a new one.

@ryanlovett
Copy link
Collaborator Author

Of course, I think you had a script that was more specific in terms of tracking the proxy ports, but you get the gist.

@shaneknapp
Copy link
Contributor

shaneknapp commented Oct 11, 2024

I don't think the sysctl collector will help, although sysctl can report on the ephemeral port range with sysctl net.ipv4.ip_local_port_range. We don't need to be constantly changing this so we don't need to collect the info.

yep, this exactly. we need to report the dynamic number of ephemeral ports in use and ryan's quick bash script is a great start.

#!/bin/bash

# Define the ephemeral port range (default Linux is 32768-60999)
read low high < /proc/sys/net/ipv4/ip_local_port_range

# Count the number of open TCP connections in the ephemeral port range
used_ports=$(ss -tan | awk -v low=$low -v high=$high '{split($4, a, ":"); port=a[length(a)]; if (port >= low && port <= high) print port}' | wc -l)

sadly, ss isn't installed by default on our nodes so we'll need to use netstat. we'll also probably need to put this in an upstream PR so that it gets included in the chp proxy image as i'm not sure otherwise how'd we'd provision these pods with an additional script.

In any case, this git issue can be renamed to track this, or you can close it and open a new one.

i'm down for either option. if possible, i'd prefer to keep this mainly as a github issue as this is something i'd like the jupyter devs to be able to see/chime in w/ideas etc.

@felder
Copy link
Contributor

felder commented Oct 11, 2024

@shaneknapp what do you think? Close this for now?

@shaneknapp
Copy link
Contributor

@shaneknapp what do you think? Close this for now?

yeah, let's close this for now. in fact, i will do it myself! XD

@shaneknapp shaneknapp closed this Oct 11, 2024
@felder
Copy link
Contributor

felder commented Oct 11, 2024

Opened https://jira-secure.berkeley.edu/browse/DH-396 and mentioned this issue because I want to preserve Ryan's comments here.

@ryanlovett
Copy link
Collaborator Author

Re:

sadly, ss isn't installed by default on our nodes so we'll need to use netstat.

If you revisit this, you can also add an extraContainer to the prometheus.nodeExporter config that pulls in an image containing whatever executables you need, sets volume mounts to access the textfile location, and runs the script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants