Skip to content

Commit

Permalink
prod: add tolerations to clusterpolicy daemonset
Browse files Browse the repository at this point in the history
The NVIDIA DaemonSets are unable to run their pods without specifying
the tolerations given to the GPU nodes in the AcceleratorProfiles.

See #647 and nerc-project/operations#913
  • Loading branch information
jtriley committed Jan 31, 2025
1 parent 36c6506 commit 0899001
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
apiVersion: nvidia.com/v1
kind: ClusterPolicy
metadata:
name: gpu-cluster-policy
spec:
daemonsets:
tolerations:
- effect: NoSchedule
key: nvidia.com/gpu.product
operator: Equal
value: NVIDIA-A100-SXM4-40GB
- effect: NoSchedule
key: nvidia.com/gpu.product
operator: Equal
value: Tesla-V100-PCIE-32GB
2 changes: 2 additions & 0 deletions nvidia-gpu-operator/overlays/nerc-ocp-prod/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,5 @@ kind: Kustomization
namespace: nvidia-gpu-operator
resources:
- ../../base
patches:
- path: clusterpolicy/clusterpolicy_patch.yaml

0 comments on commit 0899001

Please sign in to comment.