Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrading autopilot to version v2.1.0 #646

Merged
merged 2 commits into from
Feb 7, 2025

Conversation

computate
Copy link
Member

@computate computate commented Jan 31, 2025

This is for the latest GPU health checks from IBM.

Also allows autopilot containers to run privileged for nvidia-smi. Because of our use of ACCEPT_NVIDIA_VISIBLE_DEVICES_ENVVAR_WHEN_UNPRIVILEGED=false, we set privileged=true for the autopilot service account so it can load the NVIDIA tools like nvidia-smi to check on the GPU health, without actually claiming the GPU.

@computate computate force-pushed the autopilot-v2.1.0 branch 2 times, most recently from 87c8565 to 77d1e84 Compare February 3, 2025 16:24
This is for the latest GPU health checks from IBM.
Because of our use of
ACCEPT_NVIDIA_VISIBLE_DEVICES_ENVVAR_WHEN_UNPRIVILEGED=false, we set
privileged=true for the autopilot service account so it can load the
NVIDIA tools like nvidia-smi to check on the GPU health, without
actually claiming the GPU.
@computate computate merged commit b0a22d5 into OCP-on-NERC:main Feb 7, 2025
2 checks passed
@computate
Copy link
Member Author

Latest autopilot metrics and dashboards have been applied!
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants