From 0ab92ebb44c69f19d44e48c3a01319bba1a84e48 Mon Sep 17 00:00:00 2001 From: Amanuel Engeda <74629455+engedaam@users.noreply.github.com> Date: Fri, 31 Jan 2025 10:19:24 -0800 Subject: [PATCH 01/34] docs: Update to Karpenter version reference after patch releases (#7671) --- charts/karpenter-crd/Chart.yaml | 4 ++-- charts/karpenter/Chart.yaml | 4 ++-- charts/karpenter/README.md | 12 ++++++------ website/content/en/docs/faq.md | 4 ++-- .../getting-started-with-karpenter/_index.md | 8 ++++---- .../getting-started/migrating-from-cas/_index.md | 4 ++-- website/content/en/docs/reference/cloudformation.md | 2 +- website/content/en/docs/reference/threat-model.md | 10 +++++----- website/content/en/v1.1/faq.md | 4 ++-- .../getting-started-with-karpenter/_index.md | 8 ++++---- .../getting-started/migrating-from-cas/_index.md | 4 ++-- website/content/en/v1.1/reference/cloudformation.md | 2 +- website/content/en/v1.1/reference/threat-model.md | 10 +++++----- website/content/en/v1.2/faq.md | 4 ++-- .../getting-started-with-karpenter/_index.md | 8 ++++---- .../getting-started/migrating-from-cas/_index.md | 4 ++-- website/content/en/v1.2/reference/cloudformation.md | 2 +- website/content/en/v1.2/reference/threat-model.md | 10 +++++----- website/hugo.yaml | 2 +- 19 files changed, 53 insertions(+), 53 deletions(-) diff --git a/charts/karpenter-crd/Chart.yaml b/charts/karpenter-crd/Chart.yaml index 3a5d3a64a5b5..c9381307deec 100644 --- a/charts/karpenter-crd/Chart.yaml +++ b/charts/karpenter-crd/Chart.yaml @@ -2,8 +2,8 @@ apiVersion: v2 name: karpenter-crd description: A Helm chart for Karpenter Custom Resource Definitions (CRDs). type: application -version: 1.2.0 -appVersion: 1.2.0 +version: 1.2.1 +appVersion: 1.2.1 keywords: - cluster - node diff --git a/charts/karpenter/Chart.yaml b/charts/karpenter/Chart.yaml index 2c6939c749f6..607f7367dce2 100644 --- a/charts/karpenter/Chart.yaml +++ b/charts/karpenter/Chart.yaml @@ -2,8 +2,8 @@ apiVersion: v2 name: karpenter description: A Helm chart for Karpenter, an open-source node provisioning project built for Kubernetes. type: application -version: 1.2.0 -appVersion: 1.2.0 +version: 1.2.1 +appVersion: 1.2.1 keywords: - cluster - node diff --git a/charts/karpenter/README.md b/charts/karpenter/README.md index be4a7f7e9b26..a4dbca075163 100644 --- a/charts/karpenter/README.md +++ b/charts/karpenter/README.md @@ -15,7 +15,7 @@ You can follow the detailed installation instruction in the [documentation](http ```bash helm upgrade --install --namespace karpenter --create-namespace \ karpenter oci://public.ecr.aws/karpenter/karpenter \ - --version 1.2.0 \ + --version 1.2.1 \ --set "serviceAccount.annotations.eks\.amazonaws\.com/role-arn=${KARPENTER_IAM_ROLE_ARN}" \ --set settings.clusterName=${CLUSTER_NAME} \ --set settings.interruptionQueue=${CLUSTER_NAME} \ @@ -27,13 +27,13 @@ helm upgrade --install --namespace karpenter --create-namespace \ As the OCI Helm chart is signed by [Cosign](https://github.com/sigstore/cosign) as part of the release process you can verify the chart before installing it by running the following command. ```shell -cosign verify public.ecr.aws/karpenter/karpenter:1.2.0 \ +cosign verify public.ecr.aws/karpenter/karpenter:1.2.1 \ --certificate-oidc-issuer=https://token.actions.githubusercontent.com \ --certificate-identity-regexp='https://github\.com/aws/karpenter-provider-aws/\.github/workflows/release\.yaml@.+' \ --certificate-github-workflow-repository=aws/karpenter-provider-aws \ --certificate-github-workflow-name=Release \ - --certificate-github-workflow-ref=refs/tags/v1.2.0 \ - --annotations version=1.2.0 + --certificate-github-workflow-ref=refs/tags/v1.2.1 \ + --annotations version=1.2.1 ``` ## Values @@ -49,9 +49,9 @@ cosign verify public.ecr.aws/karpenter/karpenter:1.2.0 \ | controller.envFrom | list | `[]` | | | controller.extraVolumeMounts | list | `[]` | Additional volumeMounts for the controller pod. | | controller.healthProbe.port | int | `8081` | The container port to use for http health probe. | -| controller.image.digest | string | `"sha256:24b8fe57f02b70fc4ab3cd6d5aa0d73a6f3d0c62ca5d23d7ffc8853eac01e324"` | SHA256 digest of the controller image. | +| controller.image.digest | string | `"sha256:6d771157293958fdf58ea64613e6fb5f3854ed5bebe68fdb457259e29ee68b43"` | SHA256 digest of the controller image. | | controller.image.repository | string | `"public.ecr.aws/karpenter/controller"` | Repository path to the controller image. | -| controller.image.tag | string | `"1.2.0"` | Tag of the controller image. | +| controller.image.tag | string | `"1.2.1"` | Tag of the controller image. | | controller.metrics.port | int | `8080` | The container port to use for metrics. | | controller.resources | object | `{}` | Resources for the controller pod. | | controller.sidecarContainer | list | `[]` | Additional sidecarContainer config | diff --git a/website/content/en/docs/faq.md b/website/content/en/docs/faq.md index b8a5c6f2ed70..977bc34fd391 100644 --- a/website/content/en/docs/faq.md +++ b/website/content/en/docs/faq.md @@ -17,7 +17,7 @@ See [Configuring NodePools]({{< ref "./concepts/#configuring-nodepools" >}}) for AWS is the first cloud provider supported by Karpenter, although it is designed to be used with other cloud providers as well. ### Can I write my own cloud provider for Karpenter? -Yes, but there is no documentation yet for it. Start with Karpenter's GitHub [cloudprovider](https://github.com/aws/karpenter-core/tree/v1.2.0/pkg/cloudprovider) documentation to see how the AWS provider is built, but there are other sections of the code that will require changes too. +Yes, but there is no documentation yet for it. Start with Karpenter's GitHub [cloudprovider](https://github.com/aws/karpenter-core/tree/v1.1.0/pkg/cloudprovider) documentation to see how the AWS provider is built, but there are other sections of the code that will require changes too. ### What operating system nodes does Karpenter deploy? Karpenter uses the OS defined by the [AMI Family in your EC2NodeClass]({{< ref "./concepts/nodeclasses#specamifamily" >}}). @@ -29,7 +29,7 @@ Karpenter has multiple mechanisms for configuring the [operating system]({{< ref Karpenter is flexible to multi-architecture configurations using [well known labels]({{< ref "./concepts/scheduling/#supported-labels">}}). ### What RBAC access is required? -All the required RBAC rules can be found in the Helm chart template. See [clusterrole-core.yaml](https://github.com/aws/karpenter/blob/v1.2.0/charts/karpenter/templates/clusterrole-core.yaml), [clusterrole.yaml](https://github.com/aws/karpenter/blob/v1.2.0/charts/karpenter/templates/clusterrole.yaml), [rolebinding.yaml](https://github.com/aws/karpenter/blob/v1.2.0/charts/karpenter/templates/rolebinding.yaml), and [role.yaml](https://github.com/aws/karpenter/blob/v1.2.0/charts/karpenter/templates/role.yaml) files for details. +All the required RBAC rules can be found in the Helm chart template. See [clusterrole-core.yaml](https://github.com/aws/karpenter/blob/v1.2.1/charts/karpenter/templates/clusterrole-core.yaml), [clusterrole.yaml](https://github.com/aws/karpenter/blob/v1.2.1/charts/karpenter/templates/clusterrole.yaml), [rolebinding.yaml](https://github.com/aws/karpenter/blob/v1.2.1/charts/karpenter/templates/rolebinding.yaml), and [role.yaml](https://github.com/aws/karpenter/blob/v1.2.1/charts/karpenter/templates/role.yaml) files for details. ### Can I run Karpenter outside of a Kubernetes cluster? Yes, as long as the controller has network and IAM/RBAC access to the Kubernetes API and your provider API. diff --git a/website/content/en/docs/getting-started/getting-started-with-karpenter/_index.md b/website/content/en/docs/getting-started/getting-started-with-karpenter/_index.md index 12bb80f69416..5c02a8a43f06 100644 --- a/website/content/en/docs/getting-started/getting-started-with-karpenter/_index.md +++ b/website/content/en/docs/getting-started/getting-started-with-karpenter/_index.md @@ -48,7 +48,7 @@ After setting up the tools, set the Karpenter and Kubernetes version: ```bash export KARPENTER_NAMESPACE="kube-system" -export KARPENTER_VERSION="1.2.0" +export KARPENTER_VERSION="1.2.1" export K8S_VERSION="1.31" ``` @@ -115,13 +115,13 @@ See [Enabling Windows support](https://docs.aws.amazon.com/eks/latest/userguide/ As the OCI Helm chart is signed by [Cosign](https://github.com/sigstore/cosign) as part of the release process you can verify the chart before installing it by running the following command. ```bash -cosign verify public.ecr.aws/karpenter/karpenter:1.2.0 \ +cosign verify public.ecr.aws/karpenter/karpenter:1.2.1 \ --certificate-oidc-issuer=https://token.actions.githubusercontent.com \ --certificate-identity-regexp='https://github\.com/aws/karpenter-provider-aws/\.github/workflows/release\.yaml@.+' \ --certificate-github-workflow-repository=aws/karpenter-provider-aws \ --certificate-github-workflow-name=Release \ - --certificate-github-workflow-ref=refs/tags/v1.2.0 \ - --annotations version=1.2.0 + --certificate-github-workflow-ref=refs/tags/v1.2.1 \ + --annotations version=1.2.1 ``` {{% alert title="DNS Policy Notice" color="warning" %}} diff --git a/website/content/en/docs/getting-started/migrating-from-cas/_index.md b/website/content/en/docs/getting-started/migrating-from-cas/_index.md index 073aeb0444cd..e3e71026f304 100644 --- a/website/content/en/docs/getting-started/migrating-from-cas/_index.md +++ b/website/content/en/docs/getting-started/migrating-from-cas/_index.md @@ -92,7 +92,7 @@ One for your Karpenter node role and one for your existing node group. First set the Karpenter release you want to deploy. ```bash -export KARPENTER_VERSION="1.2.0" +export KARPENTER_VERSION="1.2.1" ``` We can now generate a full Karpenter deployment yaml from the Helm chart. @@ -132,7 +132,7 @@ Now that our deployment is ready we can create the karpenter namespace, create t ## Create default NodePool -We need to create a default NodePool so Karpenter knows what types of nodes we want for unscheduled workloads. You can refer to some of the [example NodePool](https://github.com/aws/karpenter/tree/v1.2.0/examples/v1) for specific needs. +We need to create a default NodePool so Karpenter knows what types of nodes we want for unscheduled workloads. You can refer to some of the [example NodePool](https://github.com/aws/karpenter/tree/v1.2.1/examples/v1) for specific needs. {{% script file="./content/en/{VERSION}/getting-started/migrating-from-cas/scripts/step10-create-nodepool.sh" language="bash" %}} diff --git a/website/content/en/docs/reference/cloudformation.md b/website/content/en/docs/reference/cloudformation.md index 9f6145978c7b..72fe0a98ef13 100644 --- a/website/content/en/docs/reference/cloudformation.md +++ b/website/content/en/docs/reference/cloudformation.md @@ -17,7 +17,7 @@ These descriptions should allow you to understand: To download a particular version of `cloudformation.yaml`, set the version and use `curl` to pull the file to your local system: ```bash -export KARPENTER_VERSION="1.2.0" +export KARPENTER_VERSION="1.2.1" curl https://raw.githubusercontent.com/aws/karpenter-provider-aws/v"${KARPENTER_VERSION}"/website/content/en/preview/getting-started/getting-started-with-karpenter/cloudformation.yaml > cloudformation.yaml ``` diff --git a/website/content/en/docs/reference/threat-model.md b/website/content/en/docs/reference/threat-model.md index c898e01a89fd..e7cd3b8b3008 100644 --- a/website/content/en/docs/reference/threat-model.md +++ b/website/content/en/docs/reference/threat-model.md @@ -31,11 +31,11 @@ A Cluster Developer has the ability to create pods via `Deployments`, `ReplicaSe Karpenter has permissions to create and manage cloud instances. Karpenter has Kubernetes API permissions to create, update, and remove nodes, as well as evict pods. For a full list of the permissions, see the RBAC rules in the helm chart template. Karpenter also has AWS IAM permissions to create instances with IAM roles. -* [aggregate-clusterrole.yaml](https://github.com/aws/karpenter/blob/v1.2.0/charts/karpenter/templates/aggregate-clusterrole.yaml) -* [clusterrole-core.yaml](https://github.com/aws/karpenter/blob/v1.2.0/charts/karpenter/templates/clusterrole-core.yaml) -* [clusterrole.yaml](https://github.com/aws/karpenter/blob/v1.2.0/charts/karpenter/templates/clusterrole.yaml) -* [rolebinding.yaml](https://github.com/aws/karpenter/blob/v1.2.0/charts/karpenter/templates/rolebinding.yaml) -* [role.yaml](https://github.com/aws/karpenter/blob/v1.2.0/charts/karpenter/templates/role.yaml) +* [aggregate-clusterrole.yaml](https://github.com/aws/karpenter/blob/v1.2.1/charts/karpenter/templates/aggregate-clusterrole.yaml) +* [clusterrole-core.yaml](https://github.com/aws/karpenter/blob/v1.2.1/charts/karpenter/templates/clusterrole-core.yaml) +* [clusterrole.yaml](https://github.com/aws/karpenter/blob/v1.2.1/charts/karpenter/templates/clusterrole.yaml) +* [rolebinding.yaml](https://github.com/aws/karpenter/blob/v1.2.1/charts/karpenter/templates/rolebinding.yaml) +* [role.yaml](https://github.com/aws/karpenter/blob/v1.2.1/charts/karpenter/templates/role.yaml) ## Assumptions diff --git a/website/content/en/v1.1/faq.md b/website/content/en/v1.1/faq.md index 45d9cae70050..0810d2a7240c 100644 --- a/website/content/en/v1.1/faq.md +++ b/website/content/en/v1.1/faq.md @@ -17,7 +17,7 @@ See [Configuring NodePools]({{< ref "./concepts/#configuring-nodepools" >}}) for AWS is the first cloud provider supported by Karpenter, although it is designed to be used with other cloud providers as well. ### Can I write my own cloud provider for Karpenter? -Yes, but there is no documentation yet for it. Start with Karpenter's GitHub [cloudprovider](https://github.com/aws/karpenter-core/tree/v1.1.2/pkg/cloudprovider) documentation to see how the AWS provider is built, but there are other sections of the code that will require changes too. +Yes, but there is no documentation yet for it. Start with Karpenter's GitHub [cloudprovider](https://github.com/aws/karpenter-core/tree/v1.1.3/pkg/cloudprovider) documentation to see how the AWS provider is built, but there are other sections of the code that will require changes too. ### What operating system nodes does Karpenter deploy? Karpenter uses the OS defined by the [AMI Family in your EC2NodeClass]({{< ref "./concepts/nodeclasses#specamifamily" >}}). @@ -29,7 +29,7 @@ Karpenter has multiple mechanisms for configuring the [operating system]({{< ref Karpenter is flexible to multi-architecture configurations using [well known labels]({{< ref "./concepts/scheduling/#supported-labels">}}). ### What RBAC access is required? -All the required RBAC rules can be found in the Helm chart template. See [clusterrole-core.yaml](https://github.com/aws/karpenter/blob/v1.1.2/charts/karpenter/templates/clusterrole-core.yaml), [clusterrole.yaml](https://github.com/aws/karpenter/blob/v1.1.2/charts/karpenter/templates/clusterrole.yaml), [rolebinding.yaml](https://github.com/aws/karpenter/blob/v1.1.2/charts/karpenter/templates/rolebinding.yaml), and [role.yaml](https://github.com/aws/karpenter/blob/v1.1.2/charts/karpenter/templates/role.yaml) files for details. +All the required RBAC rules can be found in the Helm chart template. See [clusterrole-core.yaml](https://github.com/aws/karpenter/blob/v1.1.3/charts/karpenter/templates/clusterrole-core.yaml), [clusterrole.yaml](https://github.com/aws/karpenter/blob/v1.1.3/charts/karpenter/templates/clusterrole.yaml), [rolebinding.yaml](https://github.com/aws/karpenter/blob/v1.1.3/charts/karpenter/templates/rolebinding.yaml), and [role.yaml](https://github.com/aws/karpenter/blob/v1.1.3/charts/karpenter/templates/role.yaml) files for details. ### Can I run Karpenter outside of a Kubernetes cluster? Yes, as long as the controller has network and IAM/RBAC access to the Kubernetes API and your provider API. diff --git a/website/content/en/v1.1/getting-started/getting-started-with-karpenter/_index.md b/website/content/en/v1.1/getting-started/getting-started-with-karpenter/_index.md index d042949094ae..6e4b43aaafaa 100644 --- a/website/content/en/v1.1/getting-started/getting-started-with-karpenter/_index.md +++ b/website/content/en/v1.1/getting-started/getting-started-with-karpenter/_index.md @@ -48,7 +48,7 @@ After setting up the tools, set the Karpenter and Kubernetes version: ```bash export KARPENTER_NAMESPACE="kube-system" -export KARPENTER_VERSION="1.1.2" +export KARPENTER_VERSION="1.1.3" export K8S_VERSION="1.31" ``` @@ -115,13 +115,13 @@ See [Enabling Windows support](https://docs.aws.amazon.com/eks/latest/userguide/ As the OCI Helm chart is signed by [Cosign](https://github.com/sigstore/cosign) as part of the release process you can verify the chart before installing it by running the following command. ```bash -cosign verify public.ecr.aws/karpenter/karpenter:1.1.2 \ +cosign verify public.ecr.aws/karpenter/karpenter:1.1.3 \ --certificate-oidc-issuer=https://token.actions.githubusercontent.com \ --certificate-identity-regexp='https://github\.com/aws/karpenter-provider-aws/\.github/workflows/release\.yaml@.+' \ --certificate-github-workflow-repository=aws/karpenter-provider-aws \ --certificate-github-workflow-name=Release \ - --certificate-github-workflow-ref=refs/tags/v1.1.2 \ - --annotations version=1.1.2 + --certificate-github-workflow-ref=refs/tags/v1.1.3 \ + --annotations version=1.1.3 ``` {{% alert title="DNS Policy Notice" color="warning" %}} diff --git a/website/content/en/v1.1/getting-started/migrating-from-cas/_index.md b/website/content/en/v1.1/getting-started/migrating-from-cas/_index.md index da9938ef9ccc..2a222ae65778 100644 --- a/website/content/en/v1.1/getting-started/migrating-from-cas/_index.md +++ b/website/content/en/v1.1/getting-started/migrating-from-cas/_index.md @@ -92,7 +92,7 @@ One for your Karpenter node role and one for your existing node group. First set the Karpenter release you want to deploy. ```bash -export KARPENTER_VERSION="1.1.2" +export KARPENTER_VERSION="1.1.3" ``` We can now generate a full Karpenter deployment yaml from the Helm chart. @@ -132,7 +132,7 @@ Now that our deployment is ready we can create the karpenter namespace, create t ## Create default NodePool -We need to create a default NodePool so Karpenter knows what types of nodes we want for unscheduled workloads. You can refer to some of the [example NodePool](https://github.com/aws/karpenter/tree/v1.1.2/examples/v1) for specific needs. +We need to create a default NodePool so Karpenter knows what types of nodes we want for unscheduled workloads. You can refer to some of the [example NodePool](https://github.com/aws/karpenter/tree/v1.1.3/examples/v1) for specific needs. {{% script file="./content/en/{VERSION}/getting-started/migrating-from-cas/scripts/step10-create-nodepool.sh" language="bash" %}} diff --git a/website/content/en/v1.1/reference/cloudformation.md b/website/content/en/v1.1/reference/cloudformation.md index b636f4b6d4ca..37584930f195 100644 --- a/website/content/en/v1.1/reference/cloudformation.md +++ b/website/content/en/v1.1/reference/cloudformation.md @@ -17,7 +17,7 @@ These descriptions should allow you to understand: To download a particular version of `cloudformation.yaml`, set the version and use `curl` to pull the file to your local system: ```bash -export KARPENTER_VERSION="1.1.2" +export KARPENTER_VERSION="1.1.3" curl https://raw.githubusercontent.com/aws/karpenter-provider-aws/v"${KARPENTER_VERSION}"/website/content/en/preview/getting-started/getting-started-with-karpenter/cloudformation.yaml > cloudformation.yaml ``` diff --git a/website/content/en/v1.1/reference/threat-model.md b/website/content/en/v1.1/reference/threat-model.md index 8f034bd056f4..05390a7f64e2 100644 --- a/website/content/en/v1.1/reference/threat-model.md +++ b/website/content/en/v1.1/reference/threat-model.md @@ -31,11 +31,11 @@ A Cluster Developer has the ability to create pods via `Deployments`, `ReplicaSe Karpenter has permissions to create and manage cloud instances. Karpenter has Kubernetes API permissions to create, update, and remove nodes, as well as evict pods. For a full list of the permissions, see the RBAC rules in the helm chart template. Karpenter also has AWS IAM permissions to create instances with IAM roles. -* [aggregate-clusterrole.yaml](https://github.com/aws/karpenter/blob/v1.1.2/charts/karpenter/templates/aggregate-clusterrole.yaml) -* [clusterrole-core.yaml](https://github.com/aws/karpenter/blob/v1.1.2/charts/karpenter/templates/clusterrole-core.yaml) -* [clusterrole.yaml](https://github.com/aws/karpenter/blob/v1.1.2/charts/karpenter/templates/clusterrole.yaml) -* [rolebinding.yaml](https://github.com/aws/karpenter/blob/v1.1.2/charts/karpenter/templates/rolebinding.yaml) -* [role.yaml](https://github.com/aws/karpenter/blob/v1.1.2/charts/karpenter/templates/role.yaml) +* [aggregate-clusterrole.yaml](https://github.com/aws/karpenter/blob/v1.1.3/charts/karpenter/templates/aggregate-clusterrole.yaml) +* [clusterrole-core.yaml](https://github.com/aws/karpenter/blob/v1.1.3/charts/karpenter/templates/clusterrole-core.yaml) +* [clusterrole.yaml](https://github.com/aws/karpenter/blob/v1.1.3/charts/karpenter/templates/clusterrole.yaml) +* [rolebinding.yaml](https://github.com/aws/karpenter/blob/v1.1.3/charts/karpenter/templates/rolebinding.yaml) +* [role.yaml](https://github.com/aws/karpenter/blob/v1.1.3/charts/karpenter/templates/role.yaml) ## Assumptions diff --git a/website/content/en/v1.2/faq.md b/website/content/en/v1.2/faq.md index b8a5c6f2ed70..0a14cc731e43 100644 --- a/website/content/en/v1.2/faq.md +++ b/website/content/en/v1.2/faq.md @@ -17,7 +17,7 @@ See [Configuring NodePools]({{< ref "./concepts/#configuring-nodepools" >}}) for AWS is the first cloud provider supported by Karpenter, although it is designed to be used with other cloud providers as well. ### Can I write my own cloud provider for Karpenter? -Yes, but there is no documentation yet for it. Start with Karpenter's GitHub [cloudprovider](https://github.com/aws/karpenter-core/tree/v1.2.0/pkg/cloudprovider) documentation to see how the AWS provider is built, but there are other sections of the code that will require changes too. +Yes, but there is no documentation yet for it. Start with Karpenter's GitHub [cloudprovider](https://github.com/aws/karpenter-core/tree/v1.2.1/pkg/cloudprovider) documentation to see how the AWS provider is built, but there are other sections of the code that will require changes too. ### What operating system nodes does Karpenter deploy? Karpenter uses the OS defined by the [AMI Family in your EC2NodeClass]({{< ref "./concepts/nodeclasses#specamifamily" >}}). @@ -29,7 +29,7 @@ Karpenter has multiple mechanisms for configuring the [operating system]({{< ref Karpenter is flexible to multi-architecture configurations using [well known labels]({{< ref "./concepts/scheduling/#supported-labels">}}). ### What RBAC access is required? -All the required RBAC rules can be found in the Helm chart template. See [clusterrole-core.yaml](https://github.com/aws/karpenter/blob/v1.2.0/charts/karpenter/templates/clusterrole-core.yaml), [clusterrole.yaml](https://github.com/aws/karpenter/blob/v1.2.0/charts/karpenter/templates/clusterrole.yaml), [rolebinding.yaml](https://github.com/aws/karpenter/blob/v1.2.0/charts/karpenter/templates/rolebinding.yaml), and [role.yaml](https://github.com/aws/karpenter/blob/v1.2.0/charts/karpenter/templates/role.yaml) files for details. +All the required RBAC rules can be found in the Helm chart template. See [clusterrole-core.yaml](https://github.com/aws/karpenter/blob/v1.2.1/charts/karpenter/templates/clusterrole-core.yaml), [clusterrole.yaml](https://github.com/aws/karpenter/blob/v1.2.1/charts/karpenter/templates/clusterrole.yaml), [rolebinding.yaml](https://github.com/aws/karpenter/blob/v1.2.1/charts/karpenter/templates/rolebinding.yaml), and [role.yaml](https://github.com/aws/karpenter/blob/v1.2.1/charts/karpenter/templates/role.yaml) files for details. ### Can I run Karpenter outside of a Kubernetes cluster? Yes, as long as the controller has network and IAM/RBAC access to the Kubernetes API and your provider API. diff --git a/website/content/en/v1.2/getting-started/getting-started-with-karpenter/_index.md b/website/content/en/v1.2/getting-started/getting-started-with-karpenter/_index.md index 12bb80f69416..5c02a8a43f06 100644 --- a/website/content/en/v1.2/getting-started/getting-started-with-karpenter/_index.md +++ b/website/content/en/v1.2/getting-started/getting-started-with-karpenter/_index.md @@ -48,7 +48,7 @@ After setting up the tools, set the Karpenter and Kubernetes version: ```bash export KARPENTER_NAMESPACE="kube-system" -export KARPENTER_VERSION="1.2.0" +export KARPENTER_VERSION="1.2.1" export K8S_VERSION="1.31" ``` @@ -115,13 +115,13 @@ See [Enabling Windows support](https://docs.aws.amazon.com/eks/latest/userguide/ As the OCI Helm chart is signed by [Cosign](https://github.com/sigstore/cosign) as part of the release process you can verify the chart before installing it by running the following command. ```bash -cosign verify public.ecr.aws/karpenter/karpenter:1.2.0 \ +cosign verify public.ecr.aws/karpenter/karpenter:1.2.1 \ --certificate-oidc-issuer=https://token.actions.githubusercontent.com \ --certificate-identity-regexp='https://github\.com/aws/karpenter-provider-aws/\.github/workflows/release\.yaml@.+' \ --certificate-github-workflow-repository=aws/karpenter-provider-aws \ --certificate-github-workflow-name=Release \ - --certificate-github-workflow-ref=refs/tags/v1.2.0 \ - --annotations version=1.2.0 + --certificate-github-workflow-ref=refs/tags/v1.2.1 \ + --annotations version=1.2.1 ``` {{% alert title="DNS Policy Notice" color="warning" %}} diff --git a/website/content/en/v1.2/getting-started/migrating-from-cas/_index.md b/website/content/en/v1.2/getting-started/migrating-from-cas/_index.md index 073aeb0444cd..e3e71026f304 100644 --- a/website/content/en/v1.2/getting-started/migrating-from-cas/_index.md +++ b/website/content/en/v1.2/getting-started/migrating-from-cas/_index.md @@ -92,7 +92,7 @@ One for your Karpenter node role and one for your existing node group. First set the Karpenter release you want to deploy. ```bash -export KARPENTER_VERSION="1.2.0" +export KARPENTER_VERSION="1.2.1" ``` We can now generate a full Karpenter deployment yaml from the Helm chart. @@ -132,7 +132,7 @@ Now that our deployment is ready we can create the karpenter namespace, create t ## Create default NodePool -We need to create a default NodePool so Karpenter knows what types of nodes we want for unscheduled workloads. You can refer to some of the [example NodePool](https://github.com/aws/karpenter/tree/v1.2.0/examples/v1) for specific needs. +We need to create a default NodePool so Karpenter knows what types of nodes we want for unscheduled workloads. You can refer to some of the [example NodePool](https://github.com/aws/karpenter/tree/v1.2.1/examples/v1) for specific needs. {{% script file="./content/en/{VERSION}/getting-started/migrating-from-cas/scripts/step10-create-nodepool.sh" language="bash" %}} diff --git a/website/content/en/v1.2/reference/cloudformation.md b/website/content/en/v1.2/reference/cloudformation.md index 9f6145978c7b..72fe0a98ef13 100644 --- a/website/content/en/v1.2/reference/cloudformation.md +++ b/website/content/en/v1.2/reference/cloudformation.md @@ -17,7 +17,7 @@ These descriptions should allow you to understand: To download a particular version of `cloudformation.yaml`, set the version and use `curl` to pull the file to your local system: ```bash -export KARPENTER_VERSION="1.2.0" +export KARPENTER_VERSION="1.2.1" curl https://raw.githubusercontent.com/aws/karpenter-provider-aws/v"${KARPENTER_VERSION}"/website/content/en/preview/getting-started/getting-started-with-karpenter/cloudformation.yaml > cloudformation.yaml ``` diff --git a/website/content/en/v1.2/reference/threat-model.md b/website/content/en/v1.2/reference/threat-model.md index c898e01a89fd..e7cd3b8b3008 100644 --- a/website/content/en/v1.2/reference/threat-model.md +++ b/website/content/en/v1.2/reference/threat-model.md @@ -31,11 +31,11 @@ A Cluster Developer has the ability to create pods via `Deployments`, `ReplicaSe Karpenter has permissions to create and manage cloud instances. Karpenter has Kubernetes API permissions to create, update, and remove nodes, as well as evict pods. For a full list of the permissions, see the RBAC rules in the helm chart template. Karpenter also has AWS IAM permissions to create instances with IAM roles. -* [aggregate-clusterrole.yaml](https://github.com/aws/karpenter/blob/v1.2.0/charts/karpenter/templates/aggregate-clusterrole.yaml) -* [clusterrole-core.yaml](https://github.com/aws/karpenter/blob/v1.2.0/charts/karpenter/templates/clusterrole-core.yaml) -* [clusterrole.yaml](https://github.com/aws/karpenter/blob/v1.2.0/charts/karpenter/templates/clusterrole.yaml) -* [rolebinding.yaml](https://github.com/aws/karpenter/blob/v1.2.0/charts/karpenter/templates/rolebinding.yaml) -* [role.yaml](https://github.com/aws/karpenter/blob/v1.2.0/charts/karpenter/templates/role.yaml) +* [aggregate-clusterrole.yaml](https://github.com/aws/karpenter/blob/v1.2.1/charts/karpenter/templates/aggregate-clusterrole.yaml) +* [clusterrole-core.yaml](https://github.com/aws/karpenter/blob/v1.2.1/charts/karpenter/templates/clusterrole-core.yaml) +* [clusterrole.yaml](https://github.com/aws/karpenter/blob/v1.2.1/charts/karpenter/templates/clusterrole.yaml) +* [rolebinding.yaml](https://github.com/aws/karpenter/blob/v1.2.1/charts/karpenter/templates/rolebinding.yaml) +* [role.yaml](https://github.com/aws/karpenter/blob/v1.2.1/charts/karpenter/templates/role.yaml) ## Assumptions diff --git a/website/hugo.yaml b/website/hugo.yaml index 3c404e66334f..f602f6a31418 100644 --- a/website/hugo.yaml +++ b/website/hugo.yaml @@ -76,7 +76,7 @@ params: url: "https://slack.k8s.io/" icon: fab fa-slack desc: "Chat with us on Slack in the #aws-provider channel" - latest_release_version: "1.2.0" + latest_release_version: "1.2.1" latest_k8s_version: "1.31" versions: - v0.32 From 24accc5a32396bd913c4e4b032eaa29f9da71c0c Mon Sep 17 00:00:00 2001 From: Reed Schalo Date: Fri, 31 Jan 2025 12:21:41 -0800 Subject: [PATCH 02/34] docs: update nodes eligible dashboard name (#7662) --- .../karpenter-capacity-dashboard.json | 2 +- .../karpenter-capacity-dashboard.json | 2 +- .../karpenter-capacity-dashboard.json | 2 +- .../karpenter-capacity-dashboard.json | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/website/content/en/docs/getting-started/getting-started-with-karpenter/karpenter-capacity-dashboard.json b/website/content/en/docs/getting-started/getting-started-with-karpenter/karpenter-capacity-dashboard.json index 992f9e53459c..6f2dab91c7cf 100644 --- a/website/content/en/docs/getting-started/getting-started-with-karpenter/karpenter-capacity-dashboard.json +++ b/website/content/en/docs/getting-started/getting-started-with-karpenter/karpenter-capacity-dashboard.json @@ -379,7 +379,7 @@ "refId": "A" } ], - "title": "Nodes Eligibale for Disruptions by \"reason\"", + "title": "Nodes Eligible for Disruptions by \"reason\"", "type": "timeseries" }, { diff --git a/website/content/en/preview/getting-started/getting-started-with-karpenter/karpenter-capacity-dashboard.json b/website/content/en/preview/getting-started/getting-started-with-karpenter/karpenter-capacity-dashboard.json index 992f9e53459c..6f2dab91c7cf 100644 --- a/website/content/en/preview/getting-started/getting-started-with-karpenter/karpenter-capacity-dashboard.json +++ b/website/content/en/preview/getting-started/getting-started-with-karpenter/karpenter-capacity-dashboard.json @@ -379,7 +379,7 @@ "refId": "A" } ], - "title": "Nodes Eligibale for Disruptions by \"reason\"", + "title": "Nodes Eligible for Disruptions by \"reason\"", "type": "timeseries" }, { diff --git a/website/content/en/v1.0/getting-started/getting-started-with-karpenter/karpenter-capacity-dashboard.json b/website/content/en/v1.0/getting-started/getting-started-with-karpenter/karpenter-capacity-dashboard.json index 992f9e53459c..6f2dab91c7cf 100644 --- a/website/content/en/v1.0/getting-started/getting-started-with-karpenter/karpenter-capacity-dashboard.json +++ b/website/content/en/v1.0/getting-started/getting-started-with-karpenter/karpenter-capacity-dashboard.json @@ -379,7 +379,7 @@ "refId": "A" } ], - "title": "Nodes Eligibale for Disruptions by \"reason\"", + "title": "Nodes Eligible for Disruptions by \"reason\"", "type": "timeseries" }, { diff --git a/website/content/en/v1.1/getting-started/getting-started-with-karpenter/karpenter-capacity-dashboard.json b/website/content/en/v1.1/getting-started/getting-started-with-karpenter/karpenter-capacity-dashboard.json index 992f9e53459c..6f2dab91c7cf 100644 --- a/website/content/en/v1.1/getting-started/getting-started-with-karpenter/karpenter-capacity-dashboard.json +++ b/website/content/en/v1.1/getting-started/getting-started-with-karpenter/karpenter-capacity-dashboard.json @@ -379,7 +379,7 @@ "refId": "A" } ], - "title": "Nodes Eligibale for Disruptions by \"reason\"", + "title": "Nodes Eligible for Disruptions by \"reason\"", "type": "timeseries" }, { From ec51fef294e101fc73640a005e3186e6966a279a Mon Sep 17 00:00:00 2001 From: Jason Deal Date: Fri, 31 Jan 2025 12:28:29 -0800 Subject: [PATCH 03/34] ci: pin upgrade test to v1.2.0 (#7678) --- .github/workflows/e2e-matrix.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/e2e-matrix.yaml b/.github/workflows/e2e-matrix.yaml index 59e2644f24f4..ddba085979e3 100644 --- a/.github/workflows/e2e-matrix.yaml +++ b/.github/workflows/e2e-matrix.yaml @@ -104,7 +104,7 @@ jobs: statuses: write # ./.github/actions/commit-status/start uses: ./.github/workflows/e2e-upgrade.yaml with: - from_git_ref: 2fb10b6d330ac9662bd35dc81124c7666f66e453 + from_git_ref: 93da43860fdf966100fef3e6c76eef3508733521 to_git_ref: ${{ inputs.git_ref }} region: ${{ inputs.region }} k8s_version: ${{ inputs.k8s_version }} From db9913b8bac02388b262b95029543ff4ab023927 Mon Sep 17 00:00:00 2001 From: Amanuel Engeda <74629455+engedaam@users.noreply.github.com> Date: Fri, 31 Jan 2025 14:15:29 -0800 Subject: [PATCH 04/34] docs: Update the v1 migration doc for patches (#7682) --- .../content/en/v1.0/upgrading/v1-migration.md | 36 +++++++++---------- 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/website/content/en/v1.0/upgrading/v1-migration.md b/website/content/en/v1.0/upgrading/v1-migration.md index dd2b6b07b581..b3fbb54921e6 100644 --- a/website/content/en/v1.0/upgrading/v1-migration.md +++ b/website/content/en/v1.0/upgrading/v1-migration.md @@ -171,15 +171,15 @@ You should still review the upgrade procedure; the sequence of operations remain Set the `KARPENTER_VERSION` environment variable to the latest patch release for your current minor version. The following releases are the current latest: - * `0.37.6` - * `0.36.8` - * `0.35.11` - * `v0.34.12` - * `v0.33.11` + * `0.37.7` + * `0.36.9` + * `0.35.12` + * `v0.34.13` + * `v0.33.12` ```bash # Note: v0.33.x and v0.34.x include the v prefix, omit it for versions v0.35+ - export KARPENTER_VERSION="0.37.6" # Replace with your minor version + export KARPENTER_VERSION="0.37.7" # Replace with your minor version ``` 4. Upgrade Karpenter to the latest patch release for your current minor version. @@ -323,15 +323,15 @@ Once you upgrade to Karpenter `v1.0.x`, both `v1` and `v1beta1` resources may be Due to this, you may only rollback to a version of Karpenter with the conversion webhooks. The following releases should be used as rollback targets: -* `v0.37.6` -* `v0.36.8` -* `v0.35.11` -* `v0.34.12` -* `v0.33.11` +* `v0.37.7` +* `v0.36.9` +* `v0.35.12` +* `v0.34.13` +* `v0.33.12` {{% alert title="Warning" color="warning" %}} When rolling back from `v1`, Karpenter will not retain data that was only valid in the `v1` APIs. -For instance, if you upgraded from `v0.33.5` to `v1.0.x`, updated the `NodePool.Spec.Disruption.Budgets` field, and then rolled back to `v0.33.6`, Karpenter would not retain the `NodePool.Spec.Disruption.Budgets` field, as that was introduced in `v0.34.0`. +For instance, if you upgraded from `v0.33.5` to `v1.0.x`, updated the `NodePool.Spec.Disruption.Budgets` field, and then rolled back to `v0.33.12`, Karpenter would not retain the `NodePool.Spec.Disruption.Budgets` field, as that was introduced in `v0.34.0`. If you have configured the `kubelet` field on your `EC2NodeClass` and have removed the `compatibility.karpenter.sh/v1beta1-kubelet-conversion` annotation from your `NodePools`, you must re-add the annotation before downgrading. For more information, refer to [kubelet configuration migration]({{}}). @@ -357,15 +357,15 @@ For example: `kubectl get nodepool.v1beta1.karpenter.sh`. ``` 2. Configure your target Karpenter version. You should select one of the following versions: - * `0.37.6` - * `0.36.8` - * `0.35.11` - * `v0.34.12` - * `v0.33.11` + * `0.37.7` + * `0.36.9` + * `0.35.12` + * `v0.34.13` + * `v0.33.12` ```bash # Note: v0.33.x and v0.34.x include the v prefix, omit it for versions v0.35+ - export KARPENTER_VERSION="0.37.6" # Replace with your minor version + export KARPENTER_VERSION="0.37.7" # Replace with your minor version ``` 3. Attach the `v1beta1` policy from your target version to your existing NodeRole. From e95ef172304915b6a9c211d50c7b9acf2966ed22 Mon Sep 17 00:00:00 2001 From: Jason Deal Date: Fri, 31 Jan 2025 17:40:54 -0800 Subject: [PATCH 05/34] fix: only select available AMIs (#7672) --- pkg/cloudprovider/suite_test.go | 2 + pkg/controllers/nodeclass/ami_test.go | 16 +++-- .../providers/ssm/invalidation/suite_test.go | 1 + pkg/fake/ec2api.go | 1 + pkg/fake/utils.go | 11 +++- pkg/providers/amifamily/suite_test.go | 62 +++++++++++++++++-- pkg/providers/amifamily/types.go | 6 +- pkg/providers/launchtemplate/suite_test.go | 9 +++ 8 files changed, 96 insertions(+), 12 deletions(-) diff --git a/pkg/cloudprovider/suite_test.go b/pkg/cloudprovider/suite_test.go index d08c26dcb0d5..080f308bb4f0 100644 --- a/pkg/cloudprovider/suite_test.go +++ b/pkg/cloudprovider/suite_test.go @@ -639,6 +639,7 @@ var _ = Describe("CloudProvider", func() { Value: aws.String("ami-value-1"), }, }, + State: ec2types.ImageStateAvailable, }, { Name: aws.String(coretest.RandomName()), @@ -651,6 +652,7 @@ var _ = Describe("CloudProvider", func() { Value: aws.String("ami-value-2"), }, }, + State: ec2types.ImageStateAvailable, }, }, }) diff --git a/pkg/controllers/nodeclass/ami_test.go b/pkg/controllers/nodeclass/ami_test.go index 966f13dabf41..521a012aac1e 100644 --- a/pkg/controllers/nodeclass/ami_test.go +++ b/pkg/controllers/nodeclass/ami_test.go @@ -68,6 +68,7 @@ var _ = Describe("NodeClass AMI Status Controller", func() { {Key: aws.String("Name"), Value: aws.String("amd64-standard")}, {Key: aws.String("foo"), Value: aws.String("bar")}, }, + State: ec2types.ImageStateAvailable, }, { Name: aws.String("amd64-standard-new"), @@ -78,6 +79,7 @@ var _ = Describe("NodeClass AMI Status Controller", func() { {Key: aws.String("Name"), Value: aws.String("amd64-standard")}, {Key: aws.String("foo"), Value: aws.String("bar")}, }, + State: ec2types.ImageStateAvailable, }, { Name: aws.String("amd64-nvidia"), @@ -88,6 +90,7 @@ var _ = Describe("NodeClass AMI Status Controller", func() { {Key: aws.String("Name"), Value: aws.String("amd64-nvidia")}, {Key: aws.String("foo"), Value: aws.String("bar")}, }, + State: ec2types.ImageStateAvailable, }, { Name: aws.String("amd64-neuron"), @@ -98,6 +101,7 @@ var _ = Describe("NodeClass AMI Status Controller", func() { {Key: aws.String("Name"), Value: aws.String("amd64-neuron")}, {Key: aws.String("foo"), Value: aws.String("bar")}, }, + State: ec2types.ImageStateAvailable, }, { Name: aws.String("arm64-standard"), @@ -108,6 +112,7 @@ var _ = Describe("NodeClass AMI Status Controller", func() { {Key: aws.String("Name"), Value: aws.String("arm64-standard")}, {Key: aws.String("foo"), Value: aws.String("bar")}, }, + State: ec2types.ImageStateAvailable, }, { Name: aws.String("arm64-nvidia"), @@ -118,6 +123,7 @@ var _ = Describe("NodeClass AMI Status Controller", func() { {Key: aws.String("Name"), Value: aws.String("arm64-nvidia")}, {Key: aws.String("foo"), Value: aws.String("bar")}, }, + State: ec2types.ImageStateAvailable, }, }, }) @@ -555,6 +561,7 @@ var _ = Describe("NodeClass AMI Status Controller", func() { {Key: aws.String("Name"), Value: aws.String("test-ami-3")}, {Key: aws.String("foo"), Value: aws.String("bar")}, }, + State: ec2types.ImageStateAvailable, }, { Name: aws.String("test-ami-2"), @@ -565,6 +572,7 @@ var _ = Describe("NodeClass AMI Status Controller", func() { {Key: aws.String("Name"), Value: aws.String("test-ami-2")}, {Key: aws.String("foo"), Value: aws.String("bar")}, }, + State: ec2types.ImageStateAvailable, }, }, }) @@ -659,8 +667,7 @@ var _ = Describe("NodeClass AMI Status Controller", func() { Key: corev1.LabelArchStable, Operator: corev1.NodeSelectorOpIn, Values: []string{karpv1.ArchitectureArm64}, - }, - }, + }}, }, { Name: "test-ami-3", @@ -671,8 +678,7 @@ var _ = Describe("NodeClass AMI Status Controller", func() { Key: corev1.LabelArchStable, Operator: corev1.NodeSelectorOpIn, Values: []string{karpv1.ArchitectureAmd64}, - }, - }, + }}, }, }, )) @@ -692,6 +698,7 @@ var _ = Describe("NodeClass AMI Status Controller", func() { {Key: aws.String("Name"), Value: aws.String("test-ami-4")}, {Key: aws.String("foo"), Value: aws.String("bar")}, }, + State: ec2types.ImageStateAvailable, }, { Name: aws.String("test-ami-2"), @@ -702,6 +709,7 @@ var _ = Describe("NodeClass AMI Status Controller", func() { {Key: aws.String("Name"), Value: aws.String("test-ami-2")}, {Key: aws.String("foo"), Value: aws.String("bar")}, }, + State: ec2types.ImageStateAvailable, }, }, }) diff --git a/pkg/controllers/providers/ssm/invalidation/suite_test.go b/pkg/controllers/providers/ssm/invalidation/suite_test.go index 62fff9768498..1e99a24a1d5c 100644 --- a/pkg/controllers/providers/ssm/invalidation/suite_test.go +++ b/pkg/controllers/providers/ssm/invalidation/suite_test.go @@ -158,6 +158,7 @@ func deprecateAMIs(amiIDs ...string) { CreationDate: lo.ToPtr(awsEnv.Clock.Now().Add(-24 * time.Hour).Format(time.RFC3339)), Architecture: "x86_64", DeprecationTime: lo.ToPtr(awsEnv.Clock.Now().Add(-12 * time.Hour).Format(time.RFC3339)), + State: ec2types.ImageStateAvailable, } }), }) diff --git a/pkg/fake/ec2api.go b/pkg/fake/ec2api.go index 623058db84b9..8c770f9039b0 100644 --- a/pkg/fake/ec2api.go +++ b/pkg/fake/ec2api.go @@ -338,6 +338,7 @@ func (e *EC2API) DescribeImages(_ context.Context, input *ec2.DescribeImagesInpu ImageId: aws.String(test.RandomName()), CreationDate: aws.String(time.Now().Format(time.UnixDate)), Architecture: "x86_64", + State: ec2types.ImageStateAvailable, }, }, }, nil diff --git a/pkg/fake/utils.go b/pkg/fake/utils.go index 79384a6952ae..7941b0c0b33e 100644 --- a/pkg/fake/utils.go +++ b/pkg/fake/utils.go @@ -104,7 +104,16 @@ func FilterDescribeSubnets(subnets []ec2types.Subnet, filters []ec2types.Filter) func FilterDescribeImages(images []ec2types.Image, filters []ec2types.Filter) []ec2types.Image { return lo.Filter(images, func(image ec2types.Image, _ int) bool { - return Filter(filters, *image.ImageId, *image.Name, image.Tags) + if stateFilter, ok := lo.Find(filters, func(f ec2types.Filter) bool { + return lo.FromPtr(f.Name) == "state" + }); ok { + if !lo.Contains(stateFilter.Values, string(image.State)) { + return false + } + } + return Filter(lo.Reject(filters, func(f ec2types.Filter, _ int) bool { + return lo.FromPtr(f.Name) == "state" + }), *image.ImageId, *image.Name, image.Tags) }) } diff --git a/pkg/providers/amifamily/suite_test.go b/pkg/providers/amifamily/suite_test.go index 77a5452918c1..1e4755c94998 100644 --- a/pkg/providers/amifamily/suite_test.go +++ b/pkg/providers/amifamily/suite_test.go @@ -85,6 +85,7 @@ var _ = BeforeEach(func() { {Key: aws.String("Name"), Value: aws.String(amd64AMI)}, {Key: aws.String("foo"), Value: aws.String("bar")}, }, + State: ec2types.ImageStateAvailable, }, { Name: aws.String(arm64AMI), @@ -95,6 +96,7 @@ var _ = BeforeEach(func() { {Key: aws.String("Name"), Value: aws.String(arm64AMI)}, {Key: aws.String("foo"), Value: aws.String("bar")}, }, + State: ec2types.ImageStateAvailable, }, { Name: aws.String(amd64NvidiaAMI), @@ -105,6 +107,7 @@ var _ = BeforeEach(func() { {Key: aws.String("Name"), Value: aws.String(amd64NvidiaAMI)}, {Key: aws.String("foo"), Value: aws.String("bar")}, }, + State: ec2types.ImageStateAvailable, }, { Name: aws.String(arm64NvidiaAMI), @@ -115,6 +118,7 @@ var _ = BeforeEach(func() { {Key: aws.String("Name"), Value: aws.String(arm64NvidiaAMI)}, {Key: aws.String("foo"), Value: aws.String("bar")}, }, + State: ec2types.ImageStateAvailable, }, }, }) @@ -228,6 +232,45 @@ var _ = Describe("AMIProvider", func() { } wg.Wait() }) + DescribeTable( + "should ignore images when image.state != available", + func(state ec2types.ImageState) { + awsEnv.EC2API.DescribeImagesOutput.Set(&ec2.DescribeImagesOutput{Images: []ec2types.Image{ + { + Name: aws.String(coretest.RandomName()), + ImageId: aws.String("ami-123"), + Architecture: "x86_64", + Tags: []ec2types.Tag{{Key: lo.ToPtr("test"), Value: lo.ToPtr("test")}}, + CreationDate: aws.String("2022-08-15T12:00:00Z"), + State: ec2types.ImageStateAvailable, + }, + { + Name: aws.String(coretest.RandomName()), + ImageId: aws.String("ami-456"), + Architecture: "arm64", + Tags: []ec2types.Tag{{Key: lo.ToPtr("test"), Value: lo.ToPtr("test")}}, + CreationDate: aws.String("2022-08-15T12:00:00Z"), + State: state, + }, + }}) + nodeClass.Spec.AMISelectorTerms = []v1.AMISelectorTerm{{ + Tags: map[string]string{ + "test": "test", + }, + }} + amis, err := awsEnv.AMIProvider.List(ctx, nodeClass) + Expect(err).ToNot(HaveOccurred()) + Expect(amis).To(HaveLen(1)) + Expect(amis[0].AmiID).To(Equal("ami-123")) + }, + lo.FilterMap(ec2types.ImageState("").Values(), func(state ec2types.ImageState, _ int) (TableEntry, bool) { + if state == ec2types.ImageStateAvailable { + return TableEntry{}, false + } + return Entry(string(state), state), true + }), + ) + Context("SSM Alias Missing", func() { It("should succeed to partially resolve AMIs if all SSM aliases don't exist (Al2)", func() { nodeClass.Spec.AMISelectorTerms = []v1.AMISelectorTerm{{Alias: "al2@latest"}} @@ -278,6 +321,7 @@ var _ = Describe("AMIProvider", func() { {Key: aws.String(corev1.LabelInstanceTypeStable), Value: aws.String("m5.large")}, {Key: aws.String(corev1.LabelTopologyZone), Value: aws.String("test-zone-1a")}, }, + State: ec2types.ImageStateAvailable, } awsEnv.EC2API.DescribeImagesOutput.Set(&ec2.DescribeImagesOutput{ Images: []ec2types.Image{ @@ -329,6 +373,7 @@ var _ = Describe("AMIProvider", func() { {Key: aws.String("Name"), Value: aws.String(amd64AMI)}, {Key: aws.String("foo"), Value: aws.String("bar")}, }, + State: ec2types.ImageStateAvailable, }, { Name: aws.String(amd64AMI), @@ -339,6 +384,7 @@ var _ = Describe("AMIProvider", func() { {Key: aws.String("Name"), Value: aws.String(amd64AMI)}, {Key: aws.String("foo"), Value: aws.String("bar")}, }, + State: ec2types.ImageStateAvailable, }, }, }) @@ -369,6 +415,7 @@ var _ = Describe("AMIProvider", func() { {Key: aws.String("Name"), Value: aws.String(amd64AMI)}, {Key: aws.String("foo"), Value: aws.String("bar")}, }, + State: ec2types.ImageStateAvailable, }, { Name: aws.String(amd64AMI), @@ -380,6 +427,7 @@ var _ = Describe("AMIProvider", func() { {Key: aws.String("Name"), Value: aws.String(amd64AMI)}, {Key: aws.String("foo"), Value: aws.String("bar")}, }, + State: ec2types.ImageStateAvailable, }, }, }) @@ -411,6 +459,7 @@ var _ = Describe("AMIProvider", func() { {Key: aws.String("Name"), Value: aws.String("test-ami-2")}, {Key: aws.String("foo"), Value: aws.String("bar")}, }, + State: ec2types.ImageStateAvailable, }, { Name: aws.String("test-ami-1"), @@ -422,6 +471,7 @@ var _ = Describe("AMIProvider", func() { {Key: aws.String("Name"), Value: aws.String("test-ami-1")}, {Key: aws.String("foo"), Value: aws.String("bar")}, }, + State: ec2types.ImageStateAvailable, }, }, }) @@ -452,6 +502,7 @@ var _ = Describe("AMIProvider", func() { {Key: aws.String("Name"), Value: aws.String(amd64AMI)}, {Key: aws.String("foo"), Value: aws.String("bar")}, }, + State: ec2types.ImageStateAvailable, }, { Name: aws.String(amd64AMI), @@ -463,6 +514,7 @@ var _ = Describe("AMIProvider", func() { {Key: aws.String("Name"), Value: aws.String(amd64AMI)}, {Key: aws.String("foo"), Value: aws.String("bar")}, }, + State: ec2types.ImageStateAvailable, }, }, }) @@ -498,7 +550,7 @@ var _ = Describe("AMIProvider", func() { { Filters: []ec2types.Filter{ { - Name: aws.String("tag:Name"), + Name: lo.ToPtr("tag:Name"), Values: []string{"my-ami"}, }, }, @@ -519,7 +571,7 @@ var _ = Describe("AMIProvider", func() { { Filters: []ec2types.Filter{ { - Name: aws.String("name"), + Name: lo.ToPtr("name"), Values: []string{"my-ami"}, }, }, @@ -548,7 +600,7 @@ var _ = Describe("AMIProvider", func() { { Filters: []ec2types.Filter{ { - Name: aws.String("image-id"), + Name: lo.ToPtr("image-id"), Values: []string{"ami-abcd1234", "ami-cafeaced"}, }, }, @@ -599,7 +651,7 @@ var _ = Describe("AMIProvider", func() { Owners: []string{"0123456789"}, Filters: []ec2types.Filter{ { - Name: aws.String("name"), + Name: lo.ToPtr("name"), Values: []string{"my-name"}, }, }, @@ -608,7 +660,7 @@ var _ = Describe("AMIProvider", func() { Owners: []string{"self"}, Filters: []ec2types.Filter{ { - Name: aws.String("name"), + Name: lo.ToPtr("name"), Values: []string{"my-name"}, }, }, diff --git a/pkg/providers/amifamily/types.go b/pkg/providers/amifamily/types.go index 469109350c31..f41f5e08723d 100644 --- a/pkg/providers/amifamily/types.go +++ b/pkg/providers/amifamily/types.go @@ -108,8 +108,10 @@ type DescribeImageQuery struct { func (q DescribeImageQuery) DescribeImagesInput() *ec2.DescribeImagesInput { return &ec2.DescribeImagesInput{ - // Don't include filters in the Describe Images call as EC2 API doesn't allow empty filters. - Filters: lo.Ternary(len(q.Filters) > 0, q.Filters, nil), + Filters: append(q.Filters, ec2types.Filter{ + Name: lo.ToPtr("state"), + Values: []string{string(ec2types.ImageStateAvailable)}, + }), Owners: lo.Ternary(len(q.Owners) > 0, q.Owners, nil), IncludeDeprecated: aws.Bool(true), MaxResults: aws.Int32(1000), diff --git a/pkg/providers/launchtemplate/suite_test.go b/pkg/providers/launchtemplate/suite_test.go index 1290371a3ac6..dad250257bca 100644 --- a/pkg/providers/launchtemplate/suite_test.go +++ b/pkg/providers/launchtemplate/suite_test.go @@ -1948,6 +1948,7 @@ essential = true Architecture: "x86_64", Tags: []ec2types.Tag{{Key: aws.String(corev1.LabelInstanceTypeStable), Value: aws.String("m5.large")}}, CreationDate: aws.String("2022-08-15T12:00:00Z"), + State: ec2types.ImageStateAvailable, }, { Name: aws.String(coretest.RandomName()), @@ -1955,6 +1956,7 @@ essential = true Architecture: "x86_64", Tags: []ec2types.Tag{{Key: aws.String(corev1.LabelInstanceTypeStable), Value: aws.String("m5.xlarge")}}, CreationDate: aws.String("2022-08-15T12:00:00Z"), + State: ec2types.ImageStateAvailable, }, }}) ExpectApplied(ctx, env.Client, nodeClass, nodePool) @@ -1970,6 +1972,10 @@ essential = true Name: aws.String("image-id"), Values: []string{"ami-123", "ami-456"}, }, + { + Name: aws.String("state"), + Values: []string{string(ec2types.ImageStateAvailable)}, + }, } Expect(actualFilter).To(Equal(expectedFilter)) }) @@ -2009,12 +2015,14 @@ essential = true ImageId: aws.String("ami-123"), Architecture: "x86_64", CreationDate: aws.String("2020-01-01T12:00:00Z"), + State: ec2types.ImageStateAvailable, }, { Name: aws.String(coretest.RandomName()), ImageId: aws.String("ami-456"), Architecture: "x86_64", CreationDate: aws.String("2021-01-01T12:00:00Z"), + State: ec2types.ImageStateAvailable, }, { // Incompatible because required ARM64 @@ -2022,6 +2030,7 @@ essential = true ImageId: aws.String("ami-789"), Architecture: "arm64", CreationDate: aws.String("2022-01-01T12:00:00Z"), + State: ec2types.ImageStateAvailable, }, }}) nodeClass.Spec.AMIFamily = lo.ToPtr(v1.AMIFamilyCustom) From ffd46332d27269dc4a8c4bc1879d722c35652cae Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 3 Feb 2025 13:44:06 -0800 Subject: [PATCH 06/34] chore(deps): bump the go-deps group with 12 updates (#7690) Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- go.mod | 39 ++++++++++++++------------- go.sum | 83 +++++++++++++++++++++++++++------------------------------- 2 files changed, 57 insertions(+), 65 deletions(-) diff --git a/go.mod b/go.mod index c77e504470b9..25b356a9fd93 100644 --- a/go.mod +++ b/go.mod @@ -7,18 +7,18 @@ require ( github.com/PuerkitoBio/goquery v1.10.1 github.com/avast/retry-go v3.0.0+incompatible github.com/aws/amazon-vpc-resource-controller-k8s v1.6.3 - github.com/aws/aws-sdk-go-v2 v1.34.0 - github.com/aws/aws-sdk-go-v2/config v1.29.2 - github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.16.25 - github.com/aws/aws-sdk-go-v2/service/ec2 v1.202.0 - github.com/aws/aws-sdk-go-v2/service/eks v1.57.0 - github.com/aws/aws-sdk-go-v2/service/fis v1.31.7 - github.com/aws/aws-sdk-go-v2/service/iam v1.38.8 - github.com/aws/aws-sdk-go-v2/service/pricing v1.32.12 - github.com/aws/aws-sdk-go-v2/service/sqs v1.37.10 - github.com/aws/aws-sdk-go-v2/service/ssm v1.56.8 - github.com/aws/aws-sdk-go-v2/service/sts v1.33.10 - github.com/aws/aws-sdk-go-v2/service/timestreamwrite v1.29.13 + github.com/aws/aws-sdk-go-v2 v1.36.0 + github.com/aws/aws-sdk-go-v2/config v1.29.4 + github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.16.27 + github.com/aws/aws-sdk-go-v2/service/ec2 v1.202.2 + github.com/aws/aws-sdk-go-v2/service/eks v1.57.2 + github.com/aws/aws-sdk-go-v2/service/fis v1.31.9 + github.com/aws/aws-sdk-go-v2/service/iam v1.38.10 + github.com/aws/aws-sdk-go-v2/service/pricing v1.32.14 + github.com/aws/aws-sdk-go-v2/service/sqs v1.37.12 + github.com/aws/aws-sdk-go-v2/service/ssm v1.56.10 + github.com/aws/aws-sdk-go-v2/service/sts v1.33.12 + github.com/aws/aws-sdk-go-v2/service/timestreamwrite v1.29.15 github.com/aws/karpenter-provider-aws/tools/kompat v0.0.0-20240410220356-6b868db24881 github.com/aws/smithy-go v1.22.2 github.com/awslabs/amazon-eks-ami/nodeadm v0.0.0-20240229193347-cfab22a10647 @@ -51,15 +51,15 @@ require ( require ( github.com/Masterminds/semver/v3 v3.2.1 // indirect github.com/andybalholm/cascadia v1.3.3 // indirect - github.com/aws/aws-sdk-go-v2/credentials v1.17.55 // indirect - github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.29 // indirect - github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.6.29 // indirect + github.com/aws/aws-sdk-go-v2/credentials v1.17.57 // indirect + github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.31 // indirect + github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.6.31 // indirect github.com/aws/aws-sdk-go-v2/internal/ini v1.8.2 // indirect github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.12.2 // indirect - github.com/aws/aws-sdk-go-v2/service/internal/endpoint-discovery v1.10.10 // indirect - github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.12.10 // indirect - github.com/aws/aws-sdk-go-v2/service/sso v1.24.12 // indirect - github.com/aws/aws-sdk-go-v2/service/ssooidc v1.28.11 // indirect + github.com/aws/aws-sdk-go-v2/service/internal/endpoint-discovery v1.10.12 // indirect + github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.12.12 // indirect + github.com/aws/aws-sdk-go-v2/service/sso v1.24.14 // indirect + github.com/aws/aws-sdk-go-v2/service/ssooidc v1.28.13 // indirect github.com/beorn7/perks v1.0.1 // indirect github.com/cespare/xxhash/v2 v2.3.0 // indirect github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect @@ -82,7 +82,6 @@ require ( github.com/google/pprof v0.0.0-20241210010833-40e02aabc2ad // indirect github.com/google/uuid v1.6.0 // indirect github.com/inconshreveable/mousetrap v1.1.0 // indirect - github.com/jmespath/go-jmespath v0.4.0 // indirect github.com/josharian/intern v1.0.0 // indirect github.com/json-iterator/go v1.1.12 // indirect github.com/klauspost/compress v1.17.9 // indirect diff --git a/go.sum b/go.sum index 7a054242c48f..8911c5ff34fd 100644 --- a/go.sum +++ b/go.sum @@ -10,48 +10,48 @@ github.com/avast/retry-go v3.0.0+incompatible h1:4SOWQ7Qs+oroOTQOYnAHqelpCO0biHS github.com/avast/retry-go v3.0.0+incompatible/go.mod h1:XtSnn+n/sHqQIpZ10K1qAevBhOOCWBLXXy3hyiqqBrY= github.com/aws/amazon-vpc-resource-controller-k8s v1.6.3 h1:B4o15iZP8CQoyDjoNAoQiyEPabLsgxXLY5tv3uvvCic= github.com/aws/amazon-vpc-resource-controller-k8s v1.6.3/go.mod h1:k4zcf2Dz/Mvrgo8NVzAEWP5HK4USqbJTD93pVVDxvc0= -github.com/aws/aws-sdk-go-v2 v1.34.0 h1:9iyL+cjifckRGEVpRKZP3eIxVlL06Qk1Tk13vreaVQU= -github.com/aws/aws-sdk-go-v2 v1.34.0/go.mod h1:JgstGg0JjWU1KpVJjD5H0y0yyAIpSdKEq556EI6yOOM= -github.com/aws/aws-sdk-go-v2/config v1.29.2 h1:JuIxOEPcSKpMB0J+khMjznG9LIhIBdmqNiEcPclnwqc= -github.com/aws/aws-sdk-go-v2/config v1.29.2/go.mod h1:HktTHregOZwNSM/e7WTfVSu9RCX+3eOv+6ij27PtaYs= -github.com/aws/aws-sdk-go-v2/credentials v1.17.55 h1:CDhKnDEaGkLA5ZszV/qw5uwN5M8rbv9Cl0JRN+PRsaM= -github.com/aws/aws-sdk-go-v2/credentials v1.17.55/go.mod h1:kPD/vj+RB5MREDUky376+zdnjZpR+WgdBBvwrmnlmKE= -github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.16.25 h1:kU7tmXNaJ07LsyN3BUgGqAmVmQtq0w6duVIHAKfp0/w= -github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.16.25/go.mod h1:OiC8+OiqrURb1wrwmr/UbOVLFSWEGxjinj5C299VQdo= -github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.29 h1:Ej0Rf3GMv50Qh4G4852j2djtoDb7AzQ7MuQeFHa3D70= -github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.29/go.mod h1:oeNTC7PwJNoM5AznVr23wxhLnuJv0ZDe5v7w0wqIs9M= -github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.6.29 h1:6e8a71X+9GfghragVevC5bZqvATtc3mAMgxpSNbgzF0= -github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.6.29/go.mod h1:c4jkZiQ+BWpNqq7VtrxjwISrLrt/VvPq3XiopkUIolI= +github.com/aws/aws-sdk-go-v2 v1.36.0 h1:b1wM5CcE65Ujwn565qcwgtOTT1aT4ADOHHgglKjG7fk= +github.com/aws/aws-sdk-go-v2 v1.36.0/go.mod h1:5PMILGVKiW32oDzjj6RU52yrNrDPUHcbZQYr1sM7qmM= +github.com/aws/aws-sdk-go-v2/config v1.29.4 h1:ObNqKsDYFGr2WxnoXKOhCvTlf3HhwtoGgc+KmZ4H5yg= +github.com/aws/aws-sdk-go-v2/config v1.29.4/go.mod h1:j2/AF7j/qxVmsNIChw1tWfsVKOayJoGRDjg1Tgq7NPk= +github.com/aws/aws-sdk-go-v2/credentials v1.17.57 h1:kFQDsbdBAR3GZsB8xA+51ptEnq9TIj3tS4MuP5b+TcQ= +github.com/aws/aws-sdk-go-v2/credentials v1.17.57/go.mod h1:2kerxPUUbTagAr/kkaHiqvj/bcYHzi2qiJS/ZinllU0= +github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.16.27 h1:7lOW8NUwE9UZekS1DYoiPdVAqZ6A+LheHWb+mHbNOq8= +github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.16.27/go.mod h1:w1BASFIPOPUae7AgaH4SbjNbfdkxuggLyGfNFTn8ITY= +github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.31 h1:lWm9ucLSRFiI4dQQafLrEOmEDGry3Swrz0BIRdiHJqQ= +github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.31/go.mod h1:Huu6GG0YTfbPphQkDSo4dEGmQRTKb9k9G7RdtyQWxuI= +github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.6.31 h1:ACxDklUKKXb48+eg5ROZXi1vDgfMyfIA/WyvqHcHI0o= +github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.6.31/go.mod h1:yadnfsDwqXeVaohbGc/RaD287PuyRw2wugkh5ZL2J6k= github.com/aws/aws-sdk-go-v2/internal/ini v1.8.2 h1:Pg9URiobXy85kgFev3og2CuOZ8JZUBENF+dcgWBaYNk= github.com/aws/aws-sdk-go-v2/internal/ini v1.8.2/go.mod h1:FbtygfRFze9usAadmnGJNc8KsP346kEe+y2/oyhGAGc= -github.com/aws/aws-sdk-go-v2/service/ec2 v1.202.0 h1:/kB9Uf7fgpYNLvwhAW0YiDSg7xQyxB6MbEYoC0yXtjs= -github.com/aws/aws-sdk-go-v2/service/ec2 v1.202.0/go.mod h1:cRD0Fhzj0YD+uAh16NChQAv9/BB0S9x3YK9hLx1jb/k= -github.com/aws/aws-sdk-go-v2/service/eks v1.57.0 h1:+g6K3PF6xeCqGr2MJT8CnwrluWQv0BlHO9RrwivHwWk= -github.com/aws/aws-sdk-go-v2/service/eks v1.57.0/go.mod h1:XXCcNup2LhXfIllxo6fCyHY31J8RLU3d3sM/lGGnO/s= -github.com/aws/aws-sdk-go-v2/service/fis v1.31.7 h1:cbQ5G+n9OJ+IZ4RLjoDjkVoFt7kiGxE8kdLxp3+HSik= -github.com/aws/aws-sdk-go-v2/service/fis v1.31.7/go.mod h1:K0pl22pYIH6bXKeAToBiHYZtzl1meLCkyCJRrk6Bi04= -github.com/aws/aws-sdk-go-v2/service/iam v1.38.8 h1:+PjS9gfr15U+MaUafN89dWxhbsvVrJg2D1umkc8R4uA= -github.com/aws/aws-sdk-go-v2/service/iam v1.38.8/go.mod h1:V7xF4f2fgf9GSVxTqeYQz7bNu8AITVsgqP6otlHzjPs= +github.com/aws/aws-sdk-go-v2/service/ec2 v1.202.2 h1:qas57zkkMX8OM+MVz+4sMaOaD9HRmeFJRb8nzMdYkx0= +github.com/aws/aws-sdk-go-v2/service/ec2 v1.202.2/go.mod h1:2omfxRebtpbbFqQGqeurDzlyB7Txa2e1xe9rCDFqlwA= +github.com/aws/aws-sdk-go-v2/service/eks v1.57.2 h1:Uxm6iUIEaRtyvcp8Gj45viJmM2KksMLNBRCd8DBxuJA= +github.com/aws/aws-sdk-go-v2/service/eks v1.57.2/go.mod h1:qpBx8an26dxeAoEMlHAjGkCzrYtFF1KsYycmvgSeIfU= +github.com/aws/aws-sdk-go-v2/service/fis v1.31.9 h1:Fsg7DBqm7WpC/w9MLqu9RikgsaEHv7JUe0Le99AZ3rA= +github.com/aws/aws-sdk-go-v2/service/fis v1.31.9/go.mod h1:ilhWDnlNDbCmkyVkfHasUwURSDZkPDFBsg0/BeIACvA= +github.com/aws/aws-sdk-go-v2/service/iam v1.38.10 h1:u/MwkFwRkKRDvy7D76/khJTk8HMp4mC5sZKErU53jos= +github.com/aws/aws-sdk-go-v2/service/iam v1.38.10/go.mod h1:Gid0WEVky3EWbkeXiS67kHhbiK+q3/wO/hvPh7plR0c= github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.12.2 h1:D4oz8/CzT9bAEYtVhSBmFj2dNOtaHOtMKc2vHBwYizA= github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.12.2/go.mod h1:Za3IHqTQ+yNcRHxu1OFucBh0ACZT4j4VQFF0BqpZcLY= -github.com/aws/aws-sdk-go-v2/service/internal/endpoint-discovery v1.10.10 h1:dx6ou28o859SdI4UkuH98Awkuwg4RdHawE5s6pYMQiA= -github.com/aws/aws-sdk-go-v2/service/internal/endpoint-discovery v1.10.10/go.mod h1:ilKRWYwq8gS8Wkltnph4MJUTInZefn1C1shAAZchlGg= -github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.12.10 h1:hN4yJBGswmFTOVYqmbz1GBs9ZMtQe8SrYxPwrkrlRv8= -github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.12.10/go.mod h1:TsxON4fEZXyrKY+D+3d2gSTyJkGORexIYab9PTf56DA= -github.com/aws/aws-sdk-go-v2/service/pricing v1.32.12 h1:AKLqSyeFRV1+DZJFFDDySGyGm8U+oMozb8/ZX1sRm2o= -github.com/aws/aws-sdk-go-v2/service/pricing v1.32.12/go.mod h1:9ntCvd1pERs3Fxc2ScrUvcZjGwzdnIjv35+5qkHoXlY= -github.com/aws/aws-sdk-go-v2/service/sqs v1.37.10 h1:j297R5mnr3LKYqr9xhsqDdFEL8OfHE0kGN1sTMFT00E= -github.com/aws/aws-sdk-go-v2/service/sqs v1.37.10/go.mod h1:F6guYEP0P7+rR/2zs10iNC5JPrWPmDdTV6VIYQsHnyE= -github.com/aws/aws-sdk-go-v2/service/ssm v1.56.8 h1:MBdLPDbhwvgIpjIVAo2K49b+mJgthRfq3pJ57OMF7Ro= -github.com/aws/aws-sdk-go-v2/service/ssm v1.56.8/go.mod h1:9XDwaJPbim0IsiHqC/jWwXviigOiQJC+drPPy6ZfIlE= -github.com/aws/aws-sdk-go-v2/service/sso v1.24.12 h1:kznaW4f81mNMlREkU9w3jUuJvU5g/KsqDV43ab7Rp6s= -github.com/aws/aws-sdk-go-v2/service/sso v1.24.12/go.mod h1:bZy9r8e0/s0P7BSDHgMLXK2KvdyRRBIQ2blKlvLt0IU= -github.com/aws/aws-sdk-go-v2/service/ssooidc v1.28.11 h1:mUwIpAvILeKFnRx4h1dEgGEFGuV8KJ3pEScZWVFYuZA= -github.com/aws/aws-sdk-go-v2/service/ssooidc v1.28.11/go.mod h1:JDJtD+b8HNVv71axz8+S5492KM8wTzHRFpMKQbPlYxw= -github.com/aws/aws-sdk-go-v2/service/sts v1.33.10 h1:g9d+TOsu3ac7SgmY2dUf1qMgu/uJVTlQ4VCbH6hRxSw= -github.com/aws/aws-sdk-go-v2/service/sts v1.33.10/go.mod h1:WZfNmntu92HO44MVZAubQaz3qCuIdeOdog2sADfU6hU= -github.com/aws/aws-sdk-go-v2/service/timestreamwrite v1.29.13 h1:Csa+nopgclKzWQ+FMNZiX1XYtj6KKAxYP1RBD9So8Jk= -github.com/aws/aws-sdk-go-v2/service/timestreamwrite v1.29.13/go.mod h1:qROO/r1qepNpyjfvxKG8dL9VSmZCe1yZICf27tEFMhM= +github.com/aws/aws-sdk-go-v2/service/internal/endpoint-discovery v1.10.12 h1:V1h3Cxmn0tN5EhL31uvqSLKsMlPlqiYxRwAEdwNeIJ8= +github.com/aws/aws-sdk-go-v2/service/internal/endpoint-discovery v1.10.12/go.mod h1:KzXJPn2wqsZJlNSx70gmDkRDVTmyF/RRXxTP2yMxUwc= +github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.12.12 h1:O+8vD2rGjfihBewr5bT+QUfYUHIxCVgG61LHoT59shM= +github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.12.12/go.mod h1:usVdWJaosa66NMvmCrr08NcWDBRv4E6+YFG2pUdw1Lk= +github.com/aws/aws-sdk-go-v2/service/pricing v1.32.14 h1:YajuqS3CsPEllD8NZbVzMFdmgLQfTPSTrs+H1nLRZks= +github.com/aws/aws-sdk-go-v2/service/pricing v1.32.14/go.mod h1:LfN59L0VQPjqwfeqiESbI0B4Vd3DYLFIcNUpcijGnkA= +github.com/aws/aws-sdk-go-v2/service/sqs v1.37.12 h1:8TMY/uvatjnLqllJhW0WOfAQSdLQl525yuaA0Uq1ejk= +github.com/aws/aws-sdk-go-v2/service/sqs v1.37.12/go.mod h1:LG6s2xJm3K9X9ee5EmYyOveXOgVK4jtunBJBXFJ2TqE= +github.com/aws/aws-sdk-go-v2/service/ssm v1.56.10 h1:GLRZnZtAxWIgROsRgVm8YPaAG0t9pUwaxrkda/g9JiU= +github.com/aws/aws-sdk-go-v2/service/ssm v1.56.10/go.mod h1:kh7898L3bN432TMBiRBe5Ua4IrUAaq1LwHhbqabeOOk= +github.com/aws/aws-sdk-go-v2/service/sso v1.24.14 h1:c5WJ3iHz7rLIgArznb3JCSQT3uUMiz9DLZhIX+1G8ok= +github.com/aws/aws-sdk-go-v2/service/sso v1.24.14/go.mod h1:+JJQTxB6N4niArC14YNtxcQtwEqzS3o9Z32n7q33Rfs= +github.com/aws/aws-sdk-go-v2/service/ssooidc v1.28.13 h1:f1L/JtUkVODD+k1+IiSJUUv8A++2qVr+Xvb3xWXETMU= +github.com/aws/aws-sdk-go-v2/service/ssooidc v1.28.13/go.mod h1:tvqlFoja8/s0o+UruA1Nrezo/df0PzdunMDDurUfg6U= +github.com/aws/aws-sdk-go-v2/service/sts v1.33.12 h1:fqg6c1KVrc3SYWma/egWue5rKI4G2+M4wMQN2JosNAA= +github.com/aws/aws-sdk-go-v2/service/sts v1.33.12/go.mod h1:7Yn+p66q/jt38qMoVfNvjbm3D89mGBnkwDcijgtih8w= +github.com/aws/aws-sdk-go-v2/service/timestreamwrite v1.29.15 h1:2oGJG96TsCmt8d5/2B62sxzwbxTj5UpXztPWOA2Nki4= +github.com/aws/aws-sdk-go-v2/service/timestreamwrite v1.29.15/go.mod h1:GdO5LNWmaQaT0drv+xf4omi53vy4GrzjME0X7TgRMJk= github.com/aws/karpenter-provider-aws/tools/kompat v0.0.0-20240410220356-6b868db24881 h1:m9rhsGhdepdQV96tZgfy68oU75AWAjOH8u65OefTjwA= github.com/aws/karpenter-provider-aws/tools/kompat v0.0.0-20240410220356-6b868db24881/go.mod h1:+Mk5k0b6HpKobxNq+B56DOhZ+I/NiPhd5MIBhQMSTSs= github.com/aws/smithy-go v1.22.2 h1:6D9hW43xKFrRx/tXXfAlIZc4JI+yQe6snnWcQyxSyLQ= @@ -116,10 +116,6 @@ github.com/imdario/mergo v0.3.16 h1:wwQJbIsHYGMUyLSPrEq1CT16AhnhNJQ51+4fdHUnCl4= github.com/imdario/mergo v0.3.16/go.mod h1:WBLT9ZmE3lPoWsEzCh9LPo3TiwVN+ZKEjmz+hD27ysY= github.com/inconshreveable/mousetrap v1.1.0 h1:wN+x4NVGpMsO7ErUn/mUI3vEoE6Jt13X2s0bqwp9tc8= github.com/inconshreveable/mousetrap v1.1.0/go.mod h1:vpF70FUmC8bwa3OWnCshd2FqLfsEA9PFc4w1p2J65bw= -github.com/jmespath/go-jmespath v0.4.0 h1:BEgLn5cpjn8UN1mAw4NjwDrS35OdebyEtFe+9YPoQUg= -github.com/jmespath/go-jmespath v0.4.0/go.mod h1:T8mJZnbsbmF+m6zOOFylbeCJqk5+pHWvzYPziyZiYoo= -github.com/jmespath/go-jmespath/internal/testify v1.5.1 h1:shLQSRRSCCPj3f2gpwzGwWFoC7ycTf1rcQZHOlsJ6N8= -github.com/jmespath/go-jmespath/internal/testify v1.5.1/go.mod h1:L3OGu8Wl2/fWfCI6z80xFu9LTZmf1ZRjMHUOPmWr69U= github.com/jonathan-innis/aws-sdk-go-prometheus v0.1.1 h1:gmpuckrozJ3lfKqSIia9YMGh0caoQmEY7mQP5MsnbTM= github.com/jonathan-innis/aws-sdk-go-prometheus v0.1.1/go.mod h1:168XvZFghCqo32ISSWnTXwdlMKzEq+x9TqdfswCjkrQ= github.com/josharian/intern v1.0.0 h1:vlS4z54oSdjm0bgjRigI+G1HpF+tI+9rE5LLzOg8HmY= @@ -315,9 +311,6 @@ gopkg.in/evanphx/json-patch.v4 v4.12.0 h1:n6jtcsulIzXPJaxegRbvFNNrZDjbij7ny3gmSP gopkg.in/evanphx/json-patch.v4 v4.12.0/go.mod h1:p8EYWUEYMpynmqDbY58zCKCFZw8pRWMG4EsWvDvM72M= gopkg.in/inf.v0 v0.9.1 h1:73M5CoZyi3ZLMOyDlQh031Cx6N9NDJ2Vvfl76EDAgDc= gopkg.in/inf.v0 v0.9.1/go.mod h1:cWUDdTG/fYaXco+Dcufb5Vnc6Gp2YChqWtbxRZE0mXw= -gopkg.in/yaml.v2 v2.2.8/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI= -gopkg.in/yaml.v2 v2.4.0 h1:D8xgwECY7CYvx+Y2n4sBz93Jn9JRvxdiyyo8CTfuKaY= -gopkg.in/yaml.v2 v2.4.0/go.mod h1:RDklbk79AGWmwhnvt/jBztapEOGDOx6ZbXqjP6csGnQ= gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA= gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= From ffa2e3d1a1ef31c33f4f620c4cea4260b443cf68 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 3 Feb 2025 13:44:22 -0800 Subject: [PATCH 07/34] chore(deps): bump aws-actions/configure-aws-credentials from 4.0.2 to 4.0.3 in /.github/actions/e2e/dump-logs in the action-deps group (#7689) Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- .github/actions/e2e/dump-logs/action.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/actions/e2e/dump-logs/action.yaml b/.github/actions/e2e/dump-logs/action.yaml index c893323d56e4..1b116b8f06f8 100644 --- a/.github/actions/e2e/dump-logs/action.yaml +++ b/.github/actions/e2e/dump-logs/action.yaml @@ -17,7 +17,7 @@ runs: using: "composite" steps: - name: configure aws credentials - uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2 + uses: aws-actions/configure-aws-credentials@4fc4975a852c8cd99761e2de1f4ba73402e44dd9 # v4.0.3 with: role-to-assume: arn:aws:iam::${{ inputs.account_id }}:role/${{ inputs.role }} aws-region: ${{ inputs.region }} From d8d2bb31712e5acfc6e80a356ef298c70c8d5247 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 3 Feb 2025 13:45:47 -0800 Subject: [PATCH 08/34] chore(deps): bump aws-actions/configure-aws-credentials from 4.0.2 to 4.0.3 in the actions-deps group (#7688) Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- .github/workflows/codegen.yaml | 2 +- .github/workflows/dryrun-gen.yaml | 2 +- .github/workflows/e2e-cleanup.yaml | 2 +- .github/workflows/e2e-soak-trigger.yaml | 2 +- .github/workflows/e2e-upgrade.yaml | 2 +- .github/workflows/e2e.yaml | 2 +- .github/workflows/image-canary.yaml | 2 +- .github/workflows/release.yaml | 4 ++-- .github/workflows/resource-count.yaml | 2 +- .github/workflows/snapshot-pr.yaml | 2 +- .github/workflows/snapshot.yaml | 2 +- .github/workflows/sweeper.yaml | 2 +- 12 files changed, 13 insertions(+), 13 deletions(-) diff --git a/.github/workflows/codegen.yaml b/.github/workflows/codegen.yaml index 4786e7b3542c..bb1cca428ece 100644 --- a/.github/workflows/codegen.yaml +++ b/.github/workflows/codegen.yaml @@ -19,7 +19,7 @@ jobs: git config user.email "APICodeGen@users.noreply.github.com" git remote set-url origin https://x-access-token:${{ secrets.GITHUB_TOKEN }}@github.com/${{ github.repository }} git config pull.rebase false - - uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2 + - uses: aws-actions/configure-aws-credentials@4fc4975a852c8cd99761e2de1f4ba73402e44dd9 # v4.0.3 with: role-to-assume: 'arn:aws:iam::${{ vars.READONLY_ACCOUNT_ID }}:role/${{ vars.READONLY_ROLE_NAME }}' aws-region: ${{ vars.READONLY_REGION }} diff --git a/.github/workflows/dryrun-gen.yaml b/.github/workflows/dryrun-gen.yaml index 4501ea4d79cf..e74c6acf8f41 100644 --- a/.github/workflows/dryrun-gen.yaml +++ b/.github/workflows/dryrun-gen.yaml @@ -14,7 +14,7 @@ jobs: steps: - uses: actions/checkout@9bb56186c3b09b4f86b1c65136769dd318469633 # v4.1.2 - uses: ./.github/actions/install-deps - - uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2 + - uses: aws-actions/configure-aws-credentials@4fc4975a852c8cd99761e2de1f4ba73402e44dd9 # v4.0.3 with: role-to-assume: 'arn:aws:iam::${{ vars.READONLY_ACCOUNT_ID }}:role/${{ vars.READONLY_ROLE_NAME }}' aws-region: ${{ vars.READONLY_REGION }} diff --git a/.github/workflows/e2e-cleanup.yaml b/.github/workflows/e2e-cleanup.yaml index e2301ac739c9..87454ad3705b 100644 --- a/.github/workflows/e2e-cleanup.yaml +++ b/.github/workflows/e2e-cleanup.yaml @@ -25,7 +25,7 @@ jobs: with: ref: ${{ inputs.git_ref }} - name: configure aws credentials - uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2 + uses: aws-actions/configure-aws-credentials@4fc4975a852c8cd99761e2de1f4ba73402e44dd9 # v4.0.3 with: role-to-assume: arn:aws:iam::${{ vars.CI_ACCOUNT_ID }}:role/${{ vars.CI_ROLE_NAME }} aws-region: ${{ inputs.region }} diff --git a/.github/workflows/e2e-soak-trigger.yaml b/.github/workflows/e2e-soak-trigger.yaml index eda142035ca9..2ad27cf53dc4 100644 --- a/.github/workflows/e2e-soak-trigger.yaml +++ b/.github/workflows/e2e-soak-trigger.yaml @@ -13,7 +13,7 @@ jobs: steps: - uses: actions/checkout@9bb56186c3b09b4f86b1c65136769dd318469633 # v4.1.2 - name: configure aws credentials - uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 + uses: aws-actions/configure-aws-credentials@4fc4975a852c8cd99761e2de1f4ba73402e44dd9 with: role-to-assume: arn:aws:iam::${{ vars.CI_ACCOUNT_ID }}:role/${{ vars.CI_ROLE_NAME }} aws-region: eu-north-1 diff --git a/.github/workflows/e2e-upgrade.yaml b/.github/workflows/e2e-upgrade.yaml index ef7e7629e381..4a27fa05688a 100644 --- a/.github/workflows/e2e-upgrade.yaml +++ b/.github/workflows/e2e-upgrade.yaml @@ -72,7 +72,7 @@ jobs: with: ref: ${{ inputs.from_git_ref }} - name: configure aws credentials - uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2 + uses: aws-actions/configure-aws-credentials@4fc4975a852c8cd99761e2de1f4ba73402e44dd9 # v4.0.3 with: role-to-assume: arn:aws:iam::${{ vars.CI_ACCOUNT_ID }}:role/${{ vars.CI_ROLE_NAME }} aws-region: ${{ inputs.region }} diff --git a/.github/workflows/e2e.yaml b/.github/workflows/e2e.yaml index 5db7edd2789e..95eecb0aaa32 100644 --- a/.github/workflows/e2e.yaml +++ b/.github/workflows/e2e.yaml @@ -99,7 +99,7 @@ jobs: git_ref: ${{ inputs.git_ref }} - uses: ./.github/actions/install-deps - name: configure aws credentials - uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2 + uses: aws-actions/configure-aws-credentials@4fc4975a852c8cd99761e2de1f4ba73402e44dd9 # v4.0.3 with: role-to-assume: arn:aws:iam::${{ vars.CI_ACCOUNT_ID }}:role/${{ vars.CI_ROLE_NAME }} aws-region: ${{ inputs.region }} diff --git a/.github/workflows/image-canary.yaml b/.github/workflows/image-canary.yaml index c3669c539b46..8b4c7d62b719 100644 --- a/.github/workflows/image-canary.yaml +++ b/.github/workflows/image-canary.yaml @@ -12,7 +12,7 @@ jobs: steps: - uses: actions/checkout@9bb56186c3b09b4f86b1c65136769dd318469633 # v4.1.2 - name: Configure AWS credentials - uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2 + uses: aws-actions/configure-aws-credentials@4fc4975a852c8cd99761e2de1f4ba73402e44dd9 # v4.0.3 with: role-to-assume: arn:aws:iam::${{ vars.READONLY_ACCOUNT_ID }}:role/${{ vars.READONLY_ROLE_NAME }} aws-region: ${{ vars.READONLY_REGION }} diff --git a/.github/workflows/release.yaml b/.github/workflows/release.yaml index 7b8495483349..1ae371aa5d2a 100644 --- a/.github/workflows/release.yaml +++ b/.github/workflows/release.yaml @@ -28,12 +28,12 @@ jobs: - uses: ./.github/actions/e2e/install-helm with: version: v3.12.3 # Pinned to this version since v3.13.0 has issues with pushing to public ECR: https://github.com/helm/helm/issues/12442 - - uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2 + - uses: aws-actions/configure-aws-credentials@4fc4975a852c8cd99761e2de1f4ba73402e44dd9 # v4.0.3 with: role-to-assume: 'arn:aws:iam::${{ vars.RELEASE_ACCOUNT_ID }}:role/${{ vars.RELEASE_ROLE_NAME }}' aws-region: ${{ vars.RELEASE_REGION }} - run: make release - - uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2 + - uses: aws-actions/configure-aws-credentials@4fc4975a852c8cd99761e2de1f4ba73402e44dd9 # v4.0.3 with: role-to-assume: 'arn:aws:iam::${{ vars.READONLY_ACCOUNT_ID }}:role/${{ vars.READONLY_ROLE_NAME }}' aws-region: ${{ vars.READONLY_REGION }} diff --git a/.github/workflows/resource-count.yaml b/.github/workflows/resource-count.yaml index 5475506041bd..9f6fc18bd327 100644 --- a/.github/workflows/resource-count.yaml +++ b/.github/workflows/resource-count.yaml @@ -16,7 +16,7 @@ jobs: steps: - uses: actions/checkout@9bb56186c3b09b4f86b1c65136769dd318469633 # v4.1.2 - name: configure aws credentials - uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2 + uses: aws-actions/configure-aws-credentials@4fc4975a852c8cd99761e2de1f4ba73402e44dd9 # v4.0.3 with: role-to-assume: arn:aws:iam::${{ vars.CI_ACCOUNT_ID }}:role/${{ vars.CI_ROLE_NAME }} aws-region: ${{ matrix.region }} diff --git a/.github/workflows/snapshot-pr.yaml b/.github/workflows/snapshot-pr.yaml index 76d3c4729b4e..fe7c27fc61b1 100644 --- a/.github/workflows/snapshot-pr.yaml +++ b/.github/workflows/snapshot-pr.yaml @@ -30,7 +30,7 @@ jobs: name: "${{ github.workflow }} / ${{ github.job }} (pull_request_review)" git_ref: ${{ env.PR_COMMIT }} - uses: ./.github/actions/install-deps - - uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2 + - uses: aws-actions/configure-aws-credentials@4fc4975a852c8cd99761e2de1f4ba73402e44dd9 # v4.0.3 with: role-to-assume: 'arn:aws:iam::${{ vars.SNAPSHOT_ACCOUNT_ID }}:role/${{ vars.SNAPSHOT_ROLE_NAME }}' aws-region: ${{ vars.SNAPSHOT_REGION }} diff --git a/.github/workflows/snapshot.yaml b/.github/workflows/snapshot.yaml index d1969b579c65..3345c88e6716 100644 --- a/.github/workflows/snapshot.yaml +++ b/.github/workflows/snapshot.yaml @@ -16,7 +16,7 @@ jobs: with: fetch-depth: 0 - uses: ./.github/actions/install-deps - - uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2 + - uses: aws-actions/configure-aws-credentials@4fc4975a852c8cd99761e2de1f4ba73402e44dd9 # v4.0.3 with: role-to-assume: 'arn:aws:iam::${{ vars.SNAPSHOT_ACCOUNT_ID }}:role/${{ vars.SNAPSHOT_ROLE_NAME }}' aws-region: ${{ vars.SNAPSHOT_REGION }} diff --git a/.github/workflows/sweeper.yaml b/.github/workflows/sweeper.yaml index 0d4562ea4f31..e3ae658bca0e 100644 --- a/.github/workflows/sweeper.yaml +++ b/.github/workflows/sweeper.yaml @@ -17,7 +17,7 @@ jobs: steps: - uses: actions/checkout@9bb56186c3b09b4f86b1c65136769dd318469633 # v4.1.2 - name: configure aws credentials - uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2 + uses: aws-actions/configure-aws-credentials@4fc4975a852c8cd99761e2de1f4ba73402e44dd9 # v4.0.3 with: role-to-assume: arn:aws:iam::${{ vars.CI_ACCOUNT_ID }}:role/${{ vars.CI_ROLE_NAME }} aws-region: ${{ matrix.region }} From 48e5ad5a29c907931d5bb21173bdbbc9f955a68f Mon Sep 17 00:00:00 2001 From: Eng Zer Jun Date: Tue, 4 Feb 2025 05:55:24 +0800 Subject: [PATCH 09/34] chore: replace `golang.org/x/exp/slices` with slices (#7686) Signed-off-by: Eng Zer Jun --- go.mod | 1 - go.sum | 2 -- hack/docs/metrics_gen/main.go | 3 +-- test/hack/resource/clean/main.go | 2 +- test/hack/resource/go.mod | 3 +-- test/hack/resource/go.sum | 6 ++---- test/hack/resource/pkg/resourcetypes/eni.go | 2 +- test/hack/resource/pkg/resourcetypes/instance.go | 2 +- test/hack/resource/pkg/resourcetypes/instanceprofile.go | 2 +- test/hack/resource/pkg/resourcetypes/launchtemplate.go | 2 +- test/hack/resource/pkg/resourcetypes/oidc.go | 2 +- test/hack/resource/pkg/resourcetypes/securitygroup.go | 2 +- test/hack/resource/pkg/resourcetypes/stack.go | 2 +- test/hack/resource/pkg/resourcetypes/vpc_endpoint.go | 2 +- .../resource/pkg/resourcetypes/vpc_peering_connection.go | 2 +- 15 files changed, 14 insertions(+), 21 deletions(-) diff --git a/go.mod b/go.mod index 25b356a9fd93..fbf9c7165d13 100644 --- a/go.mod +++ b/go.mod @@ -35,7 +35,6 @@ require ( github.com/samber/lo v1.49.1 go.uber.org/multierr v1.11.0 go.uber.org/zap v1.27.0 - golang.org/x/exp v0.0.0-20240719175910-8a7402abbf56 golang.org/x/sync v0.10.0 k8s.io/api v0.32.1 k8s.io/apiextensions-apiserver v0.32.1 diff --git a/go.sum b/go.sum index 8911c5ff34fd..e574dfd77c78 100644 --- a/go.sum +++ b/go.sum @@ -216,8 +216,6 @@ golang.org/x/crypto v0.13.0/go.mod h1:y6Z2r+Rw4iayiXXAIxJIDAJ1zMW4yaTpebo8fPOliY golang.org/x/crypto v0.19.0/go.mod h1:Iy9bg/ha4yyC70EfRS8jz+B6ybOBKMaSxLj6P6oBDfU= golang.org/x/crypto v0.23.0/go.mod h1:CKFgDieR+mRhux2Lsu27y0fO304Db0wZe70UKqHu0v8= golang.org/x/crypto v0.31.0/go.mod h1:kDsLvtWBEx7MV9tJOj9bnXsPbxwJQ6csT/x4KIN4Ssk= -golang.org/x/exp v0.0.0-20240719175910-8a7402abbf56 h1:2dVuKD2vS7b0QIHQbpyTISPd0LeHDbnYEryqj5Q1ug8= -golang.org/x/exp v0.0.0-20240719175910-8a7402abbf56/go.mod h1:M4RDyNAINzryxdtnbRXRL/OHtkFuWGRjvuhBJpk2IlY= golang.org/x/mod v0.2.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA= golang.org/x/mod v0.3.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA= golang.org/x/mod v0.6.0-dev.0.20220419223038-86c51ed26bb4/go.mod h1:jJ57K6gSWd91VN4djpZkiMVwK6gcyfeH4XE8wZrZaV4= diff --git a/hack/docs/metrics_gen/main.go b/hack/docs/metrics_gen/main.go index 018dc5c24fc6..fa6c06bbd26d 100644 --- a/hack/docs/metrics_gen/main.go +++ b/hack/docs/metrics_gen/main.go @@ -24,11 +24,10 @@ import ( "log" "os" "path/filepath" + "slices" "sort" "strings" - "golang.org/x/exp/slices" - "github.com/samber/lo" "sigs.k8s.io/karpenter/pkg/metrics" diff --git a/test/hack/resource/clean/main.go b/test/hack/resource/clean/main.go index 2cb51b27dcf5..ea8dab632db8 100644 --- a/test/hack/resource/clean/main.go +++ b/test/hack/resource/clean/main.go @@ -18,6 +18,7 @@ import ( "context" "flag" "fmt" + "slices" "time" "github.com/aws/aws-sdk-go-v2/config" @@ -26,7 +27,6 @@ import ( "github.com/aws/aws-sdk-go-v2/service/iam" "github.com/samber/lo" "go.uber.org/zap" - "golang.org/x/exp/slices" "github.com/aws/karpenter-provider-aws/test/hack/resource/pkg/metrics" "github.com/aws/karpenter-provider-aws/test/hack/resource/pkg/resourcetypes" diff --git a/test/hack/resource/go.mod b/test/hack/resource/go.mod index 7303056fdc6e..ba1d41b8dce8 100644 --- a/test/hack/resource/go.mod +++ b/test/hack/resource/go.mod @@ -9,10 +9,9 @@ require ( github.com/aws/aws-sdk-go-v2/service/ec2 v1.160.0 github.com/aws/aws-sdk-go-v2/service/iam v1.32.0 github.com/aws/aws-sdk-go-v2/service/timestreamwrite v1.25.5 - github.com/samber/lo v1.39.0 + github.com/samber/lo v1.47.0 go.uber.org/multierr v1.11.0 go.uber.org/zap v1.27.0 - golang.org/x/exp v0.0.0-20240416160154-fe59bbe5cc7f k8s.io/api v0.30.0 ) diff --git a/test/hack/resource/go.sum b/test/hack/resource/go.sum index 97daed084386..aa830b134606 100644 --- a/test/hack/resource/go.sum +++ b/test/hack/resource/go.sum @@ -68,8 +68,8 @@ github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZb github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= github.com/rogpeppe/go-internal v1.10.0 h1:TMyTOH3F/DB16zRVcYyreMH6GnZZrwQVAoYjRBZyWFQ= github.com/rogpeppe/go-internal v1.10.0/go.mod h1:UQnix2H7Ngw/k4C5ijL5+65zddjncjaFoBhdsK/akog= -github.com/samber/lo v1.39.0 h1:4gTz1wUhNYLhFSKl6O+8peW0v2F4BCY034GRpU9WnuA= -github.com/samber/lo v1.39.0/go.mod h1:+m/ZKRl6ClXCE2Lgf3MsQlWfh4bn1bz6CXEOxnEXnEA= +github.com/samber/lo v1.47.0 h1:z7RynLwP5nbyRscyvcD043DWYoOcYRv3mV8lBeqOCLc= +github.com/samber/lo v1.47.0/go.mod h1:RmDH9Ct32Qy3gduHQuKJ3gW1fMHAnE/fAzQuf6He5cU= github.com/spf13/pflag v1.0.5 h1:iy+VFUOCP1a+8yFto/drg2CJ5u0yRoB7fZw3DKv/JXA= github.com/spf13/pflag v1.0.5/go.mod h1:McXfInJRrz4CZXVZOBLb0bTZqETkiAhM9Iw0y3An2Bg= github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME= @@ -87,8 +87,6 @@ go.uber.org/zap v1.27.0/go.mod h1:GB2qFLM7cTU87MWRP2mPIjqfIDnGu+VIO4V/SdhGo2E= golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w= golang.org/x/crypto v0.0.0-20191011191535-87dc89f01550/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI= golang.org/x/crypto v0.0.0-20200622213623-75b288015ac9/go.mod h1:LzIPMQfyMNhhGPhUkYOs5KpL4U8rLKemX1yGLhDgUto= -golang.org/x/exp v0.0.0-20240416160154-fe59bbe5cc7f h1:99ci1mjWVBWwJiEKYY6jWa4d2nTQVIEhZIptnrVb1XY= -golang.org/x/exp v0.0.0-20240416160154-fe59bbe5cc7f/go.mod h1:/lliqkxwWAhPjf5oSOIJup2XcqJaw8RGS6k3TGEc7GI= golang.org/x/mod v0.2.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA= golang.org/x/mod v0.3.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA= golang.org/x/net v0.0.0-20190404232315-eb5bcb51f2a3/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg= diff --git a/test/hack/resource/pkg/resourcetypes/eni.go b/test/hack/resource/pkg/resourcetypes/eni.go index 0f424a483b50..8c151052cf33 100644 --- a/test/hack/resource/pkg/resourcetypes/eni.go +++ b/test/hack/resource/pkg/resourcetypes/eni.go @@ -16,6 +16,7 @@ package resourcetypes import ( "context" + "slices" "time" "github.com/aws/aws-sdk-go-v2/aws" @@ -23,7 +24,6 @@ import ( ec2types "github.com/aws/aws-sdk-go-v2/service/ec2/types" "github.com/samber/lo" "go.uber.org/multierr" - "golang.org/x/exp/slices" ) type ENI struct { diff --git a/test/hack/resource/pkg/resourcetypes/instance.go b/test/hack/resource/pkg/resourcetypes/instance.go index 6c24ffa2ff10..f085d352be68 100644 --- a/test/hack/resource/pkg/resourcetypes/instance.go +++ b/test/hack/resource/pkg/resourcetypes/instance.go @@ -16,13 +16,13 @@ package resourcetypes import ( "context" + "slices" "time" "github.com/aws/aws-sdk-go-v2/service/ec2" ec2types "github.com/aws/aws-sdk-go-v2/service/ec2/types" "github.com/samber/lo" "go.uber.org/multierr" - "golang.org/x/exp/slices" ) type Instance struct { diff --git a/test/hack/resource/pkg/resourcetypes/instanceprofile.go b/test/hack/resource/pkg/resourcetypes/instanceprofile.go index 4b6b6edb96f1..3bdb3952d9c5 100644 --- a/test/hack/resource/pkg/resourcetypes/instanceprofile.go +++ b/test/hack/resource/pkg/resourcetypes/instanceprofile.go @@ -16,6 +16,7 @@ package resourcetypes import ( "context" + "slices" "time" "github.com/aws/aws-sdk-go-v2/config" @@ -23,7 +24,6 @@ import ( iamtypes "github.com/aws/aws-sdk-go-v2/service/iam/types" "github.com/samber/lo" "go.uber.org/multierr" - "golang.org/x/exp/slices" v1 "k8s.io/api/core/v1" ) diff --git a/test/hack/resource/pkg/resourcetypes/launchtemplate.go b/test/hack/resource/pkg/resourcetypes/launchtemplate.go index 9f4c0797abdd..d86c10861026 100644 --- a/test/hack/resource/pkg/resourcetypes/launchtemplate.go +++ b/test/hack/resource/pkg/resourcetypes/launchtemplate.go @@ -16,13 +16,13 @@ package resourcetypes import ( "context" + "slices" "time" "github.com/aws/aws-sdk-go-v2/service/ec2" ec2types "github.com/aws/aws-sdk-go-v2/service/ec2/types" "github.com/samber/lo" "go.uber.org/multierr" - "golang.org/x/exp/slices" ) type LaunchTemplate struct { diff --git a/test/hack/resource/pkg/resourcetypes/oidc.go b/test/hack/resource/pkg/resourcetypes/oidc.go index 51fc42b16a8f..91fa760e65ff 100644 --- a/test/hack/resource/pkg/resourcetypes/oidc.go +++ b/test/hack/resource/pkg/resourcetypes/oidc.go @@ -16,6 +16,7 @@ package resourcetypes import ( "context" + "slices" "strings" "time" @@ -24,7 +25,6 @@ import ( iamtypes "github.com/aws/aws-sdk-go-v2/service/iam/types" "github.com/samber/lo" "go.uber.org/multierr" - "golang.org/x/exp/slices" ) type OIDC struct { diff --git a/test/hack/resource/pkg/resourcetypes/securitygroup.go b/test/hack/resource/pkg/resourcetypes/securitygroup.go index d0a4db53ca4f..175ea74bf5c0 100644 --- a/test/hack/resource/pkg/resourcetypes/securitygroup.go +++ b/test/hack/resource/pkg/resourcetypes/securitygroup.go @@ -16,6 +16,7 @@ package resourcetypes import ( "context" + "slices" "time" "github.com/aws/aws-sdk-go-v2/aws" @@ -23,7 +24,6 @@ import ( ec2types "github.com/aws/aws-sdk-go-v2/service/ec2/types" "github.com/samber/lo" "go.uber.org/multierr" - "golang.org/x/exp/slices" ) type SecurityGroup struct { diff --git a/test/hack/resource/pkg/resourcetypes/stack.go b/test/hack/resource/pkg/resourcetypes/stack.go index ac73618e3436..11bb47612546 100644 --- a/test/hack/resource/pkg/resourcetypes/stack.go +++ b/test/hack/resource/pkg/resourcetypes/stack.go @@ -16,13 +16,13 @@ package resourcetypes import ( "context" + "slices" "time" "github.com/aws/aws-sdk-go-v2/service/cloudformation" cloudformationtypes "github.com/aws/aws-sdk-go-v2/service/cloudformation/types" "github.com/samber/lo" "go.uber.org/multierr" - "golang.org/x/exp/slices" ) type Stack struct { diff --git a/test/hack/resource/pkg/resourcetypes/vpc_endpoint.go b/test/hack/resource/pkg/resourcetypes/vpc_endpoint.go index b12ea02784bc..d60061caac04 100644 --- a/test/hack/resource/pkg/resourcetypes/vpc_endpoint.go +++ b/test/hack/resource/pkg/resourcetypes/vpc_endpoint.go @@ -16,12 +16,12 @@ package resourcetypes import ( "context" + "slices" "time" "github.com/aws/aws-sdk-go-v2/service/ec2" ec2types "github.com/aws/aws-sdk-go-v2/service/ec2/types" "github.com/samber/lo" - "golang.org/x/exp/slices" ) type VPCEndpoint struct { diff --git a/test/hack/resource/pkg/resourcetypes/vpc_peering_connection.go b/test/hack/resource/pkg/resourcetypes/vpc_peering_connection.go index 85e87a09a024..6a72e7d7a577 100644 --- a/test/hack/resource/pkg/resourcetypes/vpc_peering_connection.go +++ b/test/hack/resource/pkg/resourcetypes/vpc_peering_connection.go @@ -16,12 +16,12 @@ package resourcetypes import ( "context" + "slices" "time" "github.com/aws/aws-sdk-go-v2/service/ec2" ec2types "github.com/aws/aws-sdk-go-v2/service/ec2/types" "github.com/samber/lo" - "golang.org/x/exp/slices" ) type VPCPeeringConnection struct { From 9412ec261df33f07b366194d7effe0c712398988 Mon Sep 17 00:00:00 2001 From: Vacant2333 Date: Tue, 4 Feb 2025 15:34:39 +0800 Subject: [PATCH 10/34] docs: add the link to karpenter-provider-alibabacloud (#7644) Signed-off-by: Vacant2333 --- website/content/en/docs/getting-started/_index.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/website/content/en/docs/getting-started/_index.md b/website/content/en/docs/getting-started/_index.md index 27ecde8faf92..113b32331f83 100644 --- a/website/content/en/docs/getting-started/_index.md +++ b/website/content/en/docs/getting-started/_index.md @@ -11,6 +11,8 @@ To get started with Karpenter, the [Getting Started with Karpenter]({{< relref " See the [AKS Node autoprovisioning article](https://learn.microsoft.com/azure/aks/node-autoprovision) on how to use Karpenter on Azure's AKS or go to the [Karpenter provider for Azure open source repository](https://github.com/Azure/karpenter-provider-azure) for self-hosting on Azure and additional information. +See the [Deploy Karpenter on Alibabacloud's ACK](https://docs.cloudpilot.ai/karpenter/alibabacloud/preview/getting-started/set-up-a-cluster-and-add-karpenter/) to know how to use Karpenter on Alibabacloud's ACK. Or you can go to the [Karpenter-provider-alibabacloud](https://github.com/cloudpilot-ai/karpenter-provider-alibabacloud) for more details. + If you prefer, the following instructions use Terraform to create a cluster and add Karpenter: * [Amazon EKS Blueprints for Terraform](https://aws-ia.github.io/terraform-aws-eks-blueprints): Follow a basic [Getting Started](https://aws-ia.github.io/terraform-aws-eks-blueprints/getting-started/) guide and also add modules and add-ons. This includes a [Karpenter](https://aws-ia.github.io/terraform-aws-eks-blueprints/patterns/karpenter/) add-on that lets you bypass the instructions in this guide for setting up Karpenter. From ac014120eaa4851fd4ebc2be2b3d1b2a5751cd9c Mon Sep 17 00:00:00 2001 From: "David B." <5034531+dabcoder@users.noreply.github.com> Date: Tue, 4 Feb 2025 18:05:09 +0100 Subject: [PATCH 11/34] docs: update link to GOOS and GOARCH values (#7691) --- website/content/en/preview/concepts/scheduling.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/website/content/en/preview/concepts/scheduling.md b/website/content/en/preview/concepts/scheduling.md index 1d7b2daf9927..e7d0ff74459c 100755 --- a/website/content/en/preview/concepts/scheduling.md +++ b/website/content/en/preview/concepts/scheduling.md @@ -152,8 +152,8 @@ Take care to ensure the label domains are correct. A well known label like `karp | topology.kubernetes.io/zone | us-east-2a | Zones are defined by your cloud provider ([aws](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html)) | | node.kubernetes.io/instance-type | g4dn.8xlarge| Instance types are defined by your cloud provider ([aws](https://aws.amazon.com/ec2/instance-types/)) | | node.kubernetes.io/windows-build | 10.0.17763 | Windows OS build in the format "MajorVersion.MinorVersion.BuildNumber". Can be `10.0.17763` for WS2019, or `10.0.20348` for WS2022. ([k8s](https://kubernetes.io/docs/reference/labels-annotations-taints/#nodekubernetesiowindows-build)) | -| kubernetes.io/os | linux | Operating systems are defined by [GOOS values](https://github.com/golang/go/blob/master/src/go/build/syslist.go#L10) on the instance | -| kubernetes.io/arch | amd64 | Architectures are defined by [GOARCH values](https://github.com/golang/go/blob/master/src/go/build/syslist.go#L50) on the instance | +| kubernetes.io/os | linux | Operating systems are defined by [GOOS values](https://github.com/golang/go/blob/master/src/internal/syslist/syslist.go) (`KnownOS`) on the instance | +| kubernetes.io/arch | amd64 | Architectures are defined by [GOARCH values](https://github.com/golang/go/blob/master/src/internal/syslist/syslist.go) (`KnownArch`) on the instance | | karpenter.sh/capacity-type | spot | Capacity types include `spot`, `on-demand` | | karpenter.k8s.aws/instance-hypervisor | nitro | [AWS Specific] Instance types that use a specific hypervisor | | karpenter.k8s.aws/instance-encryption-in-transit-supported | true | [AWS Specific] Instance types that support (or not) in-transit encryption | From c5fada2c133cd2d4f98b497170ca2af7dbac44f9 Mon Sep 17 00:00:00 2001 From: Ben Bodenmiller Date: Tue, 4 Feb 2025 09:06:03 -0800 Subject: [PATCH 12/34] docs: Use code format for env variables in upgrade guide (#7683) --- website/content/en/v1.0/upgrading/v1-migration.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/website/content/en/v1.0/upgrading/v1-migration.md b/website/content/en/v1.0/upgrading/v1-migration.md index b3fbb54921e6..9032b9afa0b2 100644 --- a/website/content/en/v1.0/upgrading/v1-migration.md +++ b/website/content/en/v1.0/upgrading/v1-migration.md @@ -662,7 +662,7 @@ Revisit step 9 of the [upgrade procedure]({{< ref "#upgrading" >}}) and ensure t * AMI Selector Terms has a new Alias field which can only be set by itself in `EC2NodeClass.Spec.AMISelectorTerms` * Disruption Budgets by Reason was added to `NodePool.Spec.Disruption.Budgets` * TerminationGracePeriod was added to `NodePool.Spec.Template.Spec`. - * LOG_OUTPUT_PATHS and LOG_ERROR_OUTPUT_PATHS environment variables added + * `LOG_OUTPUT_PATHS` and `LOG_ERROR_OUTPUT_PATHS` environment variables added * API Rename: NodePool’s ConsolidationPolicy `WhenUnderutilized` is now renamed to `WhenEmptyOrUnderutilized` * Behavior Changes: * Expiration is now forceful and begins draining as soon as it’s expired. Karpenter does not wait for replacement capacity to be available before draining, but will start provisioning a replacement as soon as the node is expired and begins draining. @@ -683,8 +683,8 @@ Revisit step 9 of the [upgrade procedure]({{< ref "#upgrading" >}}) and ensure t * The taint used to mark nodes for disruption and termination changed from `karpenter.sh/disruption=disrupting:NoSchedule` to `karpenter.sh/disrupted:NoSchedule`. It is not recommended to tolerate this taint, however, if you were tolerating it in your applications, you'll need to adjust your taints to reflect this. * Environment Variable Changes: * Environment Variable Changes - * LOGGING_CONFIG, ASSUME_ROLE_ARN, ASSUME_ROLE_DURATION Dropped - * LEADER_ELECT renamed to DISABLE_LEADER_ELECTION + * `LOGGING_CONFIG, `ASSUME_ROLE_ARN`, `ASSUME_ROLE_DURATION` Dropped + * `LEADER_ELECT` renamed to `DISABLE_LEADER_ELECTION` * `FEATURE_GATES.DRIFT=true` was dropped and promoted to Stable, and cannot be disabled. * Users currently opting out of drift, disabling the drift feature flag will no longer be able to do so. * Defaults changed: From 1b2fb34ece6b71b05cb5dc1a1850517d1627a61e Mon Sep 17 00:00:00 2001 From: Jason Deal Date: Tue, 4 Feb 2025 09:52:54 -0800 Subject: [PATCH 13/34] docs: fix version order and remaining 1.32 bumps (#7693) --- website/content/en/docs/concepts/nodeclasses.md | 6 +++--- website/content/en/docs/faq.md | 4 ++-- .../getting-started-with-karpenter/_index.md | 4 ++-- website/content/en/v1.2/concepts/nodeclasses.md | 6 +++--- website/content/en/v1.2/faq.md | 4 ++-- .../getting-started-with-karpenter/_index.md | 4 ++-- website/hugo.yaml | 8 ++++---- 7 files changed, 18 insertions(+), 18 deletions(-) diff --git a/website/content/en/docs/concepts/nodeclasses.md b/website/content/en/docs/concepts/nodeclasses.md index f63668d4bb9a..10eead02d18d 100644 --- a/website/content/en/docs/concepts/nodeclasses.md +++ b/website/content/en/docs/concepts/nodeclasses.md @@ -740,19 +740,19 @@ The following commands can be used to determine the versions availble for an ali {{< tabpane text=true right=false >}} {{% tab "AL2023" %}} ```bash - export K8S_VERSION="1.31" + export K8S_VERSION="1.32" aws ssm get-parameters-by-path --path "/aws/service/eks/optimized-ami/$K8S_VERSION/amazon-linux-2023/" --recursive | jq -cr '.Parameters[].Name' | grep -v "recommended" | awk -F '/' '{print $10}' | sed -r 's/.*(v[[:digit:]]+)$/\1/' | sort | uniq ``` {{% /tab %}} {{% tab "AL2" %}} ```bash - export K8S_VERSION="1.31" + export K8S_VERSION="1.32" aws ssm get-parameters-by-path --path "/aws/service/eks/optimized-ami/$K8S_VERSION/amazon-linux-2/" --recursive | jq -cr '.Parameters[].Name' | grep -v "recommended" | awk -F '/' '{print $8}' | sed -r 's/.*(v[[:digit:]]+)$/\1/' | sort | uniq ``` {{% /tab %}} {{% tab "Bottlerocket" %}} ```bash - export K8S_VERSION="1.31" + export K8S_VERSION="1.32" aws ssm get-parameters-by-path --path "/aws/service/bottlerocket/aws-k8s-$K8S_VERSION" --recursive | jq -cr '.Parameters[].Name' | grep -v "latest" | awk -F '/' '{print $7}' | sort | uniq ``` {{% /tab %}} diff --git a/website/content/en/docs/faq.md b/website/content/en/docs/faq.md index 977bc34fd391..eec263f39106 100644 --- a/website/content/en/docs/faq.md +++ b/website/content/en/docs/faq.md @@ -199,10 +199,10 @@ Yes, see the [KubeletConfiguration Section in the NodePool docs]({{= v0.191.0) - [the CLI for AWS EKS](https://eksctl.io/installation) +3. `eksctl` (>= v0.202.0) - [the CLI for AWS EKS](https://eksctl.io/installation) 4. `helm` - [the package manager for Kubernetes](https://helm.sh/docs/intro/install/) [Configure the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html) @@ -49,7 +49,7 @@ After setting up the tools, set the Karpenter and Kubernetes version: ```bash export KARPENTER_NAMESPACE="kube-system" export KARPENTER_VERSION="1.2.1" -export K8S_VERSION="1.31" +export K8S_VERSION="1.32" ``` Then set the following environment variable: diff --git a/website/content/en/v1.2/concepts/nodeclasses.md b/website/content/en/v1.2/concepts/nodeclasses.md index f63668d4bb9a..10eead02d18d 100644 --- a/website/content/en/v1.2/concepts/nodeclasses.md +++ b/website/content/en/v1.2/concepts/nodeclasses.md @@ -740,19 +740,19 @@ The following commands can be used to determine the versions availble for an ali {{< tabpane text=true right=false >}} {{% tab "AL2023" %}} ```bash - export K8S_VERSION="1.31" + export K8S_VERSION="1.32" aws ssm get-parameters-by-path --path "/aws/service/eks/optimized-ami/$K8S_VERSION/amazon-linux-2023/" --recursive | jq -cr '.Parameters[].Name' | grep -v "recommended" | awk -F '/' '{print $10}' | sed -r 's/.*(v[[:digit:]]+)$/\1/' | sort | uniq ``` {{% /tab %}} {{% tab "AL2" %}} ```bash - export K8S_VERSION="1.31" + export K8S_VERSION="1.32" aws ssm get-parameters-by-path --path "/aws/service/eks/optimized-ami/$K8S_VERSION/amazon-linux-2/" --recursive | jq -cr '.Parameters[].Name' | grep -v "recommended" | awk -F '/' '{print $8}' | sed -r 's/.*(v[[:digit:]]+)$/\1/' | sort | uniq ``` {{% /tab %}} {{% tab "Bottlerocket" %}} ```bash - export K8S_VERSION="1.31" + export K8S_VERSION="1.32" aws ssm get-parameters-by-path --path "/aws/service/bottlerocket/aws-k8s-$K8S_VERSION" --recursive | jq -cr '.Parameters[].Name' | grep -v "latest" | awk -F '/' '{print $7}' | sort | uniq ``` {{% /tab %}} diff --git a/website/content/en/v1.2/faq.md b/website/content/en/v1.2/faq.md index 0a14cc731e43..dbf12939b71d 100644 --- a/website/content/en/v1.2/faq.md +++ b/website/content/en/v1.2/faq.md @@ -199,10 +199,10 @@ Yes, see the [KubeletConfiguration Section in the NodePool docs]({{= v0.191.0) - [the CLI for AWS EKS](https://eksctl.io/installation) +3. `eksctl` (>= v0.202.0) - [the CLI for AWS EKS](https://eksctl.io/installation) 4. `helm` - [the package manager for Kubernetes](https://helm.sh/docs/intro/install/) [Configure the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html) @@ -49,7 +49,7 @@ After setting up the tools, set the Karpenter and Kubernetes version: ```bash export KARPENTER_NAMESPACE="kube-system" export KARPENTER_VERSION="1.2.1" -export K8S_VERSION="1.31" +export K8S_VERSION="1.32" ``` Then set the following environment variable: diff --git a/website/hugo.yaml b/website/hugo.yaml index f602f6a31418..774a7dc5288d 100644 --- a/website/hugo.yaml +++ b/website/hugo.yaml @@ -77,12 +77,12 @@ params: icon: fab fa-slack desc: "Chat with us on Slack in the #aws-provider channel" latest_release_version: "1.2.1" - latest_k8s_version: "1.31" + latest_k8s_version: "1.32" versions: - - v0.32 - - v1.0 - - v1.1 - v1.2 + - v1.1 + - v1.0 + - v0.32 - preview menu: main: From cf530be3573ae9865183ae204cf33bfa3d0fdcae Mon Sep 17 00:00:00 2001 From: Jason Deal Date: Wed, 5 Feb 2025 11:19:10 -0800 Subject: [PATCH 14/34] fix: spurious logging from the ssm invalidation controller (#7698) --- pkg/controllers/nodeclass/ami.go | 16 ++++++++++++++++ pkg/controllers/nodeclass/controller.go | 2 +- pkg/providers/amifamily/ami.go | 9 --------- 3 files changed, 17 insertions(+), 10 deletions(-) diff --git a/pkg/controllers/nodeclass/ami.go b/pkg/controllers/nodeclass/ami.go index 26a5a7462909..dd7e5bd7e2a0 100644 --- a/pkg/controllers/nodeclass/ami.go +++ b/pkg/controllers/nodeclass/ami.go @@ -22,9 +22,11 @@ import ( "github.com/samber/lo" corev1 "k8s.io/api/core/v1" + "sigs.k8s.io/controller-runtime/pkg/log" "sigs.k8s.io/controller-runtime/pkg/reconcile" karpv1 "sigs.k8s.io/karpenter/pkg/apis/v1" + "sigs.k8s.io/karpenter/pkg/utils/pretty" v1 "github.com/aws/karpenter-provider-aws/pkg/apis/v1" "github.com/aws/karpenter-provider-aws/pkg/providers/amifamily" @@ -32,6 +34,14 @@ import ( type AMI struct { amiProvider amifamily.Provider + cm *pretty.ChangeMonitor +} + +func NewAMIReconciler(provider amifamily.Provider) *AMI { + return &AMI{ + amiProvider: provider, + cm: pretty.NewChangeMonitor(), + } } func (a *AMI) Reconcile(ctx context.Context, nodeClass *v1.EC2NodeClass) (reconcile.Result, error) { @@ -46,6 +56,12 @@ func (a *AMI) Reconcile(ctx context.Context, nodeClass *v1.EC2NodeClass) (reconc // Returning 'ok' in this case means that the nodeclass will remain in an unready state until the component is restarted. return reconcile.Result{RequeueAfter: time.Minute}, nil } + if uniqueAMIs := lo.Uniq(lo.Map(amis, func(a amifamily.AMI, _ int) string { + return a.AmiID + })); a.cm.HasChanged(fmt.Sprintf("amis/%s", nodeClass.Name), uniqueAMIs) { + log.FromContext(ctx).WithValues("ids", uniqueAMIs).V(1).Info("discovered amis") + } + nodeClass.Status.AMIs = lo.Map(amis, func(ami amifamily.AMI, _ int) v1.AMI { reqs := lo.Map(ami.Requirements.NodeSelectorRequirements(), func(item karpv1.NodeSelectorRequirementWithMinValues, _ int) corev1.NodeSelectorRequirement { return item.NodeSelectorRequirement diff --git a/pkg/controllers/nodeclass/controller.go b/pkg/controllers/nodeclass/controller.go index 9a05ea8cbdbb..48bd6b5f96c7 100644 --- a/pkg/controllers/nodeclass/controller.go +++ b/pkg/controllers/nodeclass/controller.go @@ -75,7 +75,7 @@ func NewController(kubeClient client.Client, recorder events.Recorder, subnetPro kubeClient: kubeClient, recorder: recorder, launchTemplateProvider: launchTemplateProvider, - ami: &AMI{amiProvider: amiProvider}, + ami: NewAMIReconciler(amiProvider), subnet: &Subnet{subnetProvider: subnetProvider}, securityGroup: &SecurityGroup{securityGroupProvider: securityGroupProvider}, instanceProfile: &InstanceProfile{instanceProfileProvider: instanceProfileProvider}, diff --git a/pkg/providers/amifamily/ami.go b/pkg/providers/amifamily/ami.go index ee6fdfc7257d..8c105d001177 100644 --- a/pkg/providers/amifamily/ami.go +++ b/pkg/providers/amifamily/ami.go @@ -26,7 +26,6 @@ import ( "github.com/patrickmn/go-cache" "github.com/samber/lo" "k8s.io/utils/clock" - "sigs.k8s.io/controller-runtime/pkg/log" v1 "github.com/aws/karpenter-provider-aws/pkg/apis/v1" sdk "github.com/aws/karpenter-provider-aws/pkg/aws" @@ -34,7 +33,6 @@ import ( "sigs.k8s.io/karpenter/pkg/cloudprovider" "sigs.k8s.io/karpenter/pkg/scheduling" - "sigs.k8s.io/karpenter/pkg/utils/pretty" "github.com/aws/karpenter-provider-aws/pkg/providers/ssm" ) @@ -49,7 +47,6 @@ type DefaultProvider struct { clk clock.Clock cache *cache.Cache ec2api sdk.EC2API - cm *pretty.ChangeMonitor versionProvider version.Provider ssmProvider ssm.Provider } @@ -59,7 +56,6 @@ func NewDefaultProvider(clk clock.Clock, versionProvider version.Provider, ssmPr clk: clk, cache: cache, ec2api: ec2api, - cm: pretty.NewChangeMonitor(), versionProvider: versionProvider, ssmProvider: ssmProvider, } @@ -78,11 +74,6 @@ func (p *DefaultProvider) List(ctx context.Context, nodeClass *v1.EC2NodeClass) return nil, err } amis.Sort() - uniqueAMIs := lo.Uniq(lo.Map(amis, func(a AMI, _ int) string { return a.AmiID })) - if p.cm.HasChanged(fmt.Sprintf("amis/%s", nodeClass.Name), uniqueAMIs) { - log.FromContext(ctx).WithValues( - "ids", uniqueAMIs).V(1).Info("discovered amis") - } return amis, nil } From 4693be501a9e26b1f7922315b5fb5b37b7f8b51d Mon Sep 17 00:00:00 2001 From: Jigisha Patil <89548848+jigisha620@users.noreply.github.com> Date: Thu, 6 Feb 2025 11:13:12 -0800 Subject: [PATCH 15/34] chore: Bump go minor version to 1.23.6 (#7706) --- go.mod | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/go.mod b/go.mod index fbf9c7165d13..e80d020ab339 100644 --- a/go.mod +++ b/go.mod @@ -1,6 +1,6 @@ module github.com/aws/karpenter-provider-aws -go 1.23.5 +go 1.23.6 require ( github.com/Pallinder/go-randomdata v1.2.0 From a5f6c8da78699596e09812d04afd723fb512a48c Mon Sep 17 00:00:00 2001 From: jigisha620 Date: Thu, 6 Feb 2025 17:22:59 -0800 Subject: [PATCH 16/34] chore: Update instance provider delete to check if the instance is terminated --- go.mod | 2 +- go.sum | 4 ++-- pkg/providers/instance/instance.go | 26 ++++++++++++++------------ 3 files changed, 17 insertions(+), 15 deletions(-) diff --git a/go.mod b/go.mod index e80d020ab339..032991aa0d47 100644 --- a/go.mod +++ b/go.mod @@ -43,7 +43,7 @@ require ( k8s.io/klog/v2 v2.130.1 k8s.io/utils v0.0.0-20241104100929-3ea5e8cea738 sigs.k8s.io/controller-runtime v0.20.1 - sigs.k8s.io/karpenter v1.2.1-0.20250128194523-2a09110a1cb6 + sigs.k8s.io/karpenter v1.2.1-0.20250207011955-403034a0cbd9 sigs.k8s.io/yaml v1.4.0 ) diff --git a/go.sum b/go.sum index e574dfd77c78..762b0d75c328 100644 --- a/go.sum +++ b/go.sum @@ -336,8 +336,8 @@ sigs.k8s.io/controller-runtime v0.20.1 h1:JbGMAG/X94NeM3xvjenVUaBjy6Ui4Ogd/J5Ztj sigs.k8s.io/controller-runtime v0.20.1/go.mod h1:BrP3w158MwvB3ZbNpaAcIKkHQ7YGpYnzpoSTZ8E14WU= sigs.k8s.io/json v0.0.0-20241010143419-9aa6b5e7a4b3 h1:/Rv+M11QRah1itp8VhT6HoVx1Ray9eB4DBr+K+/sCJ8= sigs.k8s.io/json v0.0.0-20241010143419-9aa6b5e7a4b3/go.mod h1:18nIHnGi6636UCz6m8i4DhaJ65T6EruyzmoQqI2BVDo= -sigs.k8s.io/karpenter v1.2.1-0.20250128194523-2a09110a1cb6 h1:AWuTX1D+2+q9sZT2IkMHauj3ZivwVzixZftlO7lJ7ZQ= -sigs.k8s.io/karpenter v1.2.1-0.20250128194523-2a09110a1cb6/go.mod h1:0PV2k6Ua1Sc04M6NIOfVXLNGyFnvdwDxaIJriic2L5o= +sigs.k8s.io/karpenter v1.2.1-0.20250207011955-403034a0cbd9 h1:/phqkLkjx+iIPoUpFzZQBzGAEYlDmFvgXrFjeH/Cw1M= +sigs.k8s.io/karpenter v1.2.1-0.20250207011955-403034a0cbd9/go.mod h1:S+qNY3XwugJTu+UvgAdeNUxWuwQP/gS0uefdrV5wFLE= sigs.k8s.io/structured-merge-diff/v4 v4.4.2 h1:MdmvkGuXi/8io6ixD5wud3vOLwc1rj0aNqRlpuvjmwA= sigs.k8s.io/structured-merge-diff/v4 v4.4.2/go.mod h1:N8f93tFZh9U6vpxwRArLiikrE5/2tiu1w1AGfACIGE4= sigs.k8s.io/yaml v1.4.0 h1:Mk1wCc2gy/F0THH0TAp1QYyJNzRm2KCLy3o5ASXVI5E= diff --git a/pkg/providers/instance/instance.go b/pkg/providers/instance/instance.go index a069d09c7d7e..34f53dc28210 100644 --- a/pkg/providers/instance/instance.go +++ b/pkg/providers/instance/instance.go @@ -174,19 +174,21 @@ func (p *DefaultProvider) List(ctx context.Context) ([]*Instance, error) { } func (p *DefaultProvider) Delete(ctx context.Context, id string) error { - if _, err := p.ec2Batcher.TerminateInstances(ctx, &ec2.TerminateInstancesInput{ - InstanceIds: []string{id}, - }); err != nil { - if awserrors.IsNotFound(err) { - return cloudprovider.NewNodeClaimNotFoundError(fmt.Errorf("instance already terminated")) - } - if _, e := p.Get(ctx, id); e != nil { - if cloudprovider.IsNodeClaimNotFoundError(e) { - return e - } - err = multierr.Append(err, e) + out, err := p.Get(ctx, id) + if err != nil { + return err + } + // Check if the instance is already shutting-down to reduce the number of terminate-instance calls we make thereby + // reducing our overall QPS. Due to EC2's eventual consistency model, the result of the terminate-instance or + // describe-instance call may return a not found error even when the instance is not terminated - + // https://docs.aws.amazon.com/ec2/latest/devguide/eventual-consistency.html. In this case, the instance will get + // picked up by the garbage collection controller and will be cleaned up eventually. + if out.State != ec2types.InstanceStateNameShuttingDown { + if _, err := p.ec2Batcher.TerminateInstances(ctx, &ec2.TerminateInstancesInput{ + InstanceIds: []string{id}, + }); err != nil { + return err } - return fmt.Errorf("terminating instance, %w", err) } return nil } From 52df823486b4cd420b02275be873264fb6a7d4e0 Mon Sep 17 00:00:00 2001 From: Julius Hinze Date: Fri, 7 Feb 2025 07:36:36 +0100 Subject: [PATCH 17/34] docs: make AMISelectorTerm.Owner string (#7646) --- website/content/en/docs/concepts/nodeclasses.md | 2 +- website/content/en/preview/concepts/nodeclasses.md | 2 +- website/content/en/v0.32/concepts/nodeclasses.md | 2 +- website/content/en/v0.32/upgrading/v1beta1-migration.md | 4 ++-- website/content/en/v1.0/concepts/nodeclasses.md | 2 +- website/content/en/v1.1/concepts/nodeclasses.md | 2 +- website/content/en/v1.2/concepts/nodeclasses.md | 2 +- 7 files changed, 8 insertions(+), 8 deletions(-) diff --git a/website/content/en/docs/concepts/nodeclasses.md b/website/content/en/docs/concepts/nodeclasses.md index 10eead02d18d..097d5b685590 100644 --- a/website/content/en/docs/concepts/nodeclasses.md +++ b/website/content/en/docs/concepts/nodeclasses.md @@ -817,7 +817,7 @@ Select by name and owner: - name: my-ami owner: self - name: my-ami - owner: 0123456789 + owner: "0123456789" ``` Select by name using a wildcard: diff --git a/website/content/en/preview/concepts/nodeclasses.md b/website/content/en/preview/concepts/nodeclasses.md index 9388d487e265..3f570a10dc6e 100644 --- a/website/content/en/preview/concepts/nodeclasses.md +++ b/website/content/en/preview/concepts/nodeclasses.md @@ -817,7 +817,7 @@ Select by name and owner: - name: my-ami owner: self - name: my-ami - owner: 0123456789 + owner: "0123456789" ``` Select by name using a wildcard: diff --git a/website/content/en/v0.32/concepts/nodeclasses.md b/website/content/en/v0.32/concepts/nodeclasses.md index d8e6a528787a..d7c837ea0fc5 100644 --- a/website/content/en/v0.32/concepts/nodeclasses.md +++ b/website/content/en/v0.32/concepts/nodeclasses.md @@ -459,7 +459,7 @@ Select by name and owner: - name: my-ami owner: self - name: my-ami - owner: 0123456789 + owner: "0123456789" ``` Select by name using a wildcard: diff --git a/website/content/en/v0.32/upgrading/v1beta1-migration.md b/website/content/en/v0.32/upgrading/v1beta1-migration.md index d048935b6547..320486660e4a 100644 --- a/website/content/en/v0.32/upgrading/v1beta1-migration.md +++ b/website/content/en/v0.32/upgrading/v1beta1-migration.md @@ -703,9 +703,9 @@ kind: EC2NodeClass spec: amiSelectorTerms: - name: my-name1 - owner: 123456789 + owner: "123456789" - name: my-name2 - owner: 123456789 + owner: "123456789" - name: my-name1 owner: amazon - name: my-name2 diff --git a/website/content/en/v1.0/concepts/nodeclasses.md b/website/content/en/v1.0/concepts/nodeclasses.md index ba95d287c8d7..f4e94a15a701 100644 --- a/website/content/en/v1.0/concepts/nodeclasses.md +++ b/website/content/en/v1.0/concepts/nodeclasses.md @@ -818,7 +818,7 @@ Select by name and owner: - name: my-ami owner: self - name: my-ami - owner: 0123456789 + owner: "0123456789" ``` Select by name using a wildcard: diff --git a/website/content/en/v1.1/concepts/nodeclasses.md b/website/content/en/v1.1/concepts/nodeclasses.md index 2c7b5d9b048a..032f95891bae 100644 --- a/website/content/en/v1.1/concepts/nodeclasses.md +++ b/website/content/en/v1.1/concepts/nodeclasses.md @@ -817,7 +817,7 @@ Select by name and owner: - name: my-ami owner: self - name: my-ami - owner: 0123456789 + owner: "0123456789" ``` Select by name using a wildcard: diff --git a/website/content/en/v1.2/concepts/nodeclasses.md b/website/content/en/v1.2/concepts/nodeclasses.md index 10eead02d18d..097d5b685590 100644 --- a/website/content/en/v1.2/concepts/nodeclasses.md +++ b/website/content/en/v1.2/concepts/nodeclasses.md @@ -817,7 +817,7 @@ Select by name and owner: - name: my-ami owner: self - name: my-ami - owner: 0123456789 + owner: "0123456789" ``` Select by name using a wildcard: From 84da3db9a06b2191edcc75b5eaca8fc9a7a5d037 Mon Sep 17 00:00:00 2001 From: Oleksiy Tsyban <110432475+oleksiytsyban@users.noreply.github.com> Date: Thu, 6 Feb 2025 23:37:20 -0700 Subject: [PATCH 18/34] chore: Helm: Add sidecarVolumeMounts and extraVolumeMounts volumes to all sidecars (#7608) Co-authored-by: Oleksiy Tsyban Co-authored-by: Jonathan Innis --- charts/karpenter/templates/deployment.yaml | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/charts/karpenter/templates/deployment.yaml b/charts/karpenter/templates/deployment.yaml index 990ce486292e..6b75ffe239fb 100644 --- a/charts/karpenter/templates/deployment.yaml +++ b/charts/karpenter/templates/deployment.yaml @@ -183,16 +183,16 @@ spec: {{- toYaml . | nindent 12 }} {{- end }} {{- end }} - {{- with .Values.controller.sidecarContainer }} - {{- toYaml . | nindent 8 }} - {{- end }} - {{- if and (.Values.controller.sidecarContainer) (or .Values.controller.extraVolumeMounts .Values.controller.sidecarVolumeMounts) }} + {{- range .Values.controller.sidecarContainer }} + - {{- toYaml . | nindent 10 }} + {{- if or $.Values.controller.extraVolumeMounts $.Values.controller.sidecarVolumeMounts }} volumeMounts: - {{- with .Values.controller.extraVolumeMounts }} + {{- with $.Values.controller.extraVolumeMounts }} {{- toYaml . | nindent 12 }} - {{- end }} - {{- with .Values.controller.sidecarVolumeMounts }} + {{- end }} + {{- with $.Values.controller.sidecarVolumeMounts }} {{- toYaml . | nindent 12 }} + {{- end }} {{- end }} {{- end }} {{- with .Values.nodeSelector }} From ed5f4611f2451c646fa0e55a66164f809b7da86a Mon Sep 17 00:00:00 2001 From: Hansuk Hong Date: Fri, 7 Feb 2025 16:04:26 +0900 Subject: [PATCH 19/34] fix(helm): MEMORY_LIMIT env for custom controller container name (#7700) Signed-off-by: flavono123 --- charts/karpenter/templates/_helpers.tpl | 6 ++++++ charts/karpenter/templates/deployment.yaml | 4 ++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/charts/karpenter/templates/_helpers.tpl b/charts/karpenter/templates/_helpers.tpl index 212f599b12dc..23569a6893bc 100644 --- a/charts/karpenter/templates/_helpers.tpl +++ b/charts/karpenter/templates/_helpers.tpl @@ -75,6 +75,12 @@ Karpenter image to use {{- end }} {{- end }} +{{/* +Karpenter controller container name +*/}} +{{- define "karpenter.controller.containerName" -}} +{{- .Values.controller.containerName | default "controller" -}} +{{- end -}} {{/* Get PodDisruptionBudget API Version */}} {{- define "karpenter.pdb.apiVersion" -}} diff --git a/charts/karpenter/templates/deployment.yaml b/charts/karpenter/templates/deployment.yaml index 6b75ffe239fb..0c0e69f75e42 100644 --- a/charts/karpenter/templates/deployment.yaml +++ b/charts/karpenter/templates/deployment.yaml @@ -60,7 +60,7 @@ spec: schedulerName: {{ . | quote }} {{- end }} containers: - - name: {{ .Values.controller.containerName | default "controller" }} + - name: {{ include "karpenter.controller.containerName" . }} securityContext: runAsUser: 65532 runAsGroup: 65532 @@ -102,7 +102,7 @@ spec: - name: MEMORY_LIMIT valueFrom: resourceFieldRef: - containerName: controller + containerName: {{ include "karpenter.controller.containerName" . }} divisor: "0" resource: limits.memory - name: FEATURE_GATES From eeac9df4e25a9db710d3dc2fcecb078dee49a569 Mon Sep 17 00:00:00 2001 From: Michael McCune Date: Fri, 7 Feb 2025 02:05:02 -0500 Subject: [PATCH 20/34] chore: add a binary makefile target (#7704) --- .gitignore | 3 +++ Makefile | 7 +++++++ 2 files changed, 10 insertions(+) diff --git a/.gitignore b/.gitignore index 459ff0ba0a8b..6f94d4d0ef5b 100644 --- a/.gitignore +++ b/.gitignore @@ -14,3 +14,6 @@ go.work.sum # Project Specific *.csv + +# Binary output +karpenter-provider-aws-* diff --git a/Makefile b/Makefile index 6bf08768fb54..2dc6e5e14009 100644 --- a/Makefile +++ b/Makefile @@ -34,6 +34,10 @@ KARPENTER_CORE_DIR = $(shell go list -m -f '{{ .Dir }}' sigs.k8s.io/karpenter) # TEST_SUITE enables you to select a specific test suite directory to run "make e2etests" against TEST_SUITE ?= "..." +# Filename when building the binary controller only +GOARCH ?= $(shell go env GOARCH) +BINARY_FILENAME = karpenter-provider-aws-$(GOARCH) + help: ## Display help @awk 'BEGIN {FS = ":.*##"; printf "Usage:\n make \033[36m\033[0m\n"} /^[a-zA-Z_0-9-]+:.*?##/ { printf " \033[36m%-15s\033[0m %s\n", $$1, $$2 } /^##@/ { printf "\n\033[1m%s\033[0m\n", substr($$0, 5) } ' $(MAKEFILE_LIST) @@ -132,6 +136,9 @@ image: ## Build the Karpenter controller images using ko build $(eval IMG_TAG=$(shell echo $(CONTROLLER_IMG) | cut -d "@" -f 1 | cut -d ":" -f 2 -s)) $(eval IMG_DIGEST=$(shell echo $(CONTROLLER_IMG) | cut -d "@" -f 2)) +binary: ## Build the Karpenter controller binary using go build + go build $(GOFLAGS) -o $(BINARY_FILENAME) ./cmd/controller/... + apply: verify image ## Deploy the controller from the current state of your git repository into your ~/.kube/config cluster kubectl apply -f ./pkg/apis/crds/ helm upgrade --install karpenter charts/karpenter --namespace ${KARPENTER_NAMESPACE} \ From 78cb8aedbfff9bb63a87ac98ccd5f0da80e13dfe Mon Sep 17 00:00:00 2001 From: Jonathan Innis Date: Fri, 7 Feb 2025 15:37:25 -0800 Subject: [PATCH 21/34] perf: Remove calling List on NodeClaims and Nodes in interruption controller (#7707) --- pkg/controllers/interruption/controller.go | 87 ++++++---------------- pkg/controllers/interruption/suite_test.go | 2 +- pkg/operator/operator.go | 31 +++++++- pkg/test/environment.go | 33 ++++++++ 4 files changed, 88 insertions(+), 65 deletions(-) diff --git a/pkg/controllers/interruption/controller.go b/pkg/controllers/interruption/controller.go index 3316b22a7e88..c96d42409c54 100644 --- a/pkg/controllers/interruption/controller.go +++ b/pkg/controllers/interruption/controller.go @@ -38,16 +38,14 @@ import ( "sigs.k8s.io/karpenter/pkg/operator/injection" karpv1 "sigs.k8s.io/karpenter/pkg/apis/v1" - nodeclaimutils "sigs.k8s.io/karpenter/pkg/utils/nodeclaim" "sigs.k8s.io/karpenter/pkg/utils/pretty" + "sigs.k8s.io/karpenter/pkg/events" + "github.com/aws/karpenter-provider-aws/pkg/cache" interruptionevents "github.com/aws/karpenter-provider-aws/pkg/controllers/interruption/events" "github.com/aws/karpenter-provider-aws/pkg/controllers/interruption/messages" "github.com/aws/karpenter-provider-aws/pkg/providers/sqs" - "github.com/aws/karpenter-provider-aws/pkg/utils" - - "sigs.k8s.io/karpenter/pkg/events" ) type Action string @@ -104,14 +102,7 @@ func (c *Controller) Reconcile(ctx context.Context) (reconcile.Result, error) { if len(sqsMessages) == 0 { return reconcile.Result{RequeueAfter: singleton.RequeueImmediately}, nil } - nodeClaimInstanceIDMap, err := c.makeNodeClaimInstanceIDMap(ctx) - if err != nil { - return reconcile.Result{}, fmt.Errorf("making nodeclaim instance id map, %w", err) - } - nodeInstanceIDMap, err := c.makeNodeInstanceIDMap(ctx) - if err != nil { - return reconcile.Result{}, fmt.Errorf("making node instance id map, %w", err) - } + errs := make([]error, len(sqsMessages)) workqueue.ParallelizeUntil(ctx, 10, len(sqsMessages), func(i int) { msg, e := c.parseMessage(sqsMessages[i]) @@ -121,7 +112,7 @@ func (c *Controller) Reconcile(ctx context.Context) (reconcile.Result, error) { errs[i] = c.deleteMessage(ctx, sqsMessages[i]) return } - if e = c.handleMessage(ctx, nodeClaimInstanceIDMap, nodeInstanceIDMap, msg); e != nil { + if e = c.handleMessage(ctx, msg); e != nil { errs[i] = fmt.Errorf("handling message, %w", e) return } @@ -154,9 +145,7 @@ func (c *Controller) parseMessage(raw *sqstypes.Message) (messages.Message, erro } // handleMessage takes an action against every node involved in the message that is owned by a NodePool -func (c *Controller) handleMessage(ctx context.Context, nodeClaimInstanceIDMap map[string]*karpv1.NodeClaim, - nodeInstanceIDMap map[string]*corev1.Node, msg messages.Message) (err error) { - +func (c *Controller) handleMessage(ctx context.Context, msg messages.Message) (err error) { ctx = log.IntoContext(ctx, log.FromContext(ctx).WithValues("messageKind", msg.Kind())) ReceivedMessages.Inc(map[string]string{messageTypeLabel: string(msg.Kind())}) @@ -164,13 +153,27 @@ func (c *Controller) handleMessage(ctx context.Context, nodeClaimInstanceIDMap m return nil } for _, instanceID := range msg.EC2InstanceIDs() { - nodeClaim, ok := nodeClaimInstanceIDMap[instanceID] - if !ok { + nodeClaimList := &karpv1.NodeClaimList{} + if e := c.kubeClient.List(ctx, nodeClaimList, client.MatchingFields{"status.instanceID": instanceID}); e != nil { + err = multierr.Append(err, e) continue } - node := nodeInstanceIDMap[instanceID] - if e := c.handleNodeClaim(ctx, msg, nodeClaim, node); e != nil { - err = multierr.Append(err, e) + if len(nodeClaimList.Items) == 0 { + continue + } + for _, nodeClaim := range nodeClaimList.Items { + nodeList := &corev1.NodeList{} + if e := c.kubeClient.List(ctx, nodeList, client.MatchingFields{"spec.instanceID": instanceID}); e != nil { + err = multierr.Append(err, e) + continue + } + var node *corev1.Node + if len(nodeList.Items) > 0 { + node = &nodeList.Items[0] + } + if e := c.handleNodeClaim(ctx, msg, &nodeClaim, node); e != nil { + err = multierr.Append(err, e) + } } } MessageLatency.Observe(time.Since(msg.StartTime()).Seconds(), nil) @@ -254,48 +257,6 @@ func (c *Controller) notifyForMessage(msg messages.Message, nodeClaim *karpv1.No } } -// makeNodeClaimInstanceIDMap builds a map between the instance id that is stored in the -// NodeClaim .status.providerID and the NodeClaim -func (c *Controller) makeNodeClaimInstanceIDMap(ctx context.Context) (map[string]*karpv1.NodeClaim, error) { - m := map[string]*karpv1.NodeClaim{} - nodeClaims, err := nodeclaimutils.ListManaged(ctx, c.kubeClient, c.cloudProvider) - if err != nil { - return nil, err - } - for _, nc := range nodeClaims { - if nc.Status.ProviderID == "" { - continue - } - id, err := utils.ParseInstanceID(nc.Status.ProviderID) - if err != nil || id == "" { - continue - } - m[id] = nc - } - return m, nil -} - -// makeNodeInstanceIDMap builds a map between the instance id that is stored in the -// node .spec.providerID and the node -func (c *Controller) makeNodeInstanceIDMap(ctx context.Context) (map[string]*corev1.Node, error) { - m := map[string]*corev1.Node{} - nodeList := &corev1.NodeList{} - if err := c.kubeClient.List(ctx, nodeList); err != nil { - return nil, fmt.Errorf("listing nodes, %w", err) - } - for i := range nodeList.Items { - if nodeList.Items[i].Spec.ProviderID == "" { - continue - } - id, err := utils.ParseInstanceID(nodeList.Items[i].Spec.ProviderID) - if err != nil || id == "" { - continue - } - m[id] = &nodeList.Items[i] - } - return m, nil -} - func actionForMessage(msg messages.Message) Action { switch msg.Kind() { case messages.ScheduledChangeKind, messages.SpotInterruptionKind, messages.InstanceStoppedKind, messages.InstanceTerminatedKind: diff --git a/pkg/controllers/interruption/suite_test.go b/pkg/controllers/interruption/suite_test.go index d66be882b6fc..042131d03164 100644 --- a/pkg/controllers/interruption/suite_test.go +++ b/pkg/controllers/interruption/suite_test.go @@ -84,7 +84,7 @@ func TestAPIs(t *testing.T) { var _ = BeforeSuite(func() { ctx = options.ToContext(ctx, test.Options()) - env = coretest.NewEnvironment(coretest.WithCRDs(apis.CRDs...), coretest.WithCRDs(v1alpha1.CRDs...)) + env = coretest.NewEnvironment(coretest.WithCRDs(apis.CRDs...), coretest.WithCRDs(v1alpha1.CRDs...), coretest.WithFieldIndexers(test.NodeInstanceIDFieldIndexer(ctx), test.NodeClaimInstanceIDFieldIndexer(ctx))) awsEnv = test.NewEnvironment(ctx, env) fakeClock = &clock.FakeClock{} unavailableOfferingsCache = awscache.NewUnavailableOfferings() diff --git a/pkg/operator/operator.go b/pkg/operator/operator.go index bef580542873..b86d73cb2dda 100644 --- a/pkg/operator/operator.go +++ b/pkg/operator/operator.go @@ -26,12 +26,13 @@ import ( "github.com/aws/aws-sdk-go-v2/aws" "github.com/aws/aws-sdk-go-v2/aws/middleware" - config "github.com/aws/aws-sdk-go-v2/config" + "github.com/aws/aws-sdk-go-v2/config" "github.com/aws/aws-sdk-go-v2/feature/ec2/imds" "github.com/aws/aws-sdk-go-v2/service/ec2" "github.com/aws/aws-sdk-go-v2/service/eks" "github.com/aws/aws-sdk-go-v2/service/iam" "github.com/aws/aws-sdk-go-v2/service/ssm" + "sigs.k8s.io/controller-runtime/pkg/manager" "github.com/aws/smithy-go" "github.com/patrickmn/go-cache" @@ -66,6 +67,7 @@ import ( ssmp "github.com/aws/karpenter-provider-aws/pkg/providers/ssm" "github.com/aws/karpenter-provider-aws/pkg/providers/subnet" "github.com/aws/karpenter-provider-aws/pkg/providers/version" + "github.com/aws/karpenter-provider-aws/pkg/utils" ) func init() { @@ -185,6 +187,10 @@ func NewOperator(ctx context.Context, operator *operator.Operator) (context.Cont launchTemplateProvider, ) + // Setup field indexers on instanceID -- specifically for the interruption controller + if options.FromContext(ctx).InterruptionQueue != "" { + SetupIndexers(ctx, operator.Manager) + } return ctx, &Operator{ Operator: operator, Config: cfg, @@ -273,3 +279,26 @@ func KubeDNSIP(ctx context.Context, kubernetesInterface kubernetes.Interface) (n } return kubeDNSIP, nil } + +func SetupIndexers(ctx context.Context, mgr manager.Manager) { + lo.Must0(mgr.GetFieldIndexer().IndexField(ctx, &karpv1.NodeClaim{}, "status.instanceID", func(o client.Object) []string { + if o.(*karpv1.NodeClaim).Status.ProviderID == "" { + return nil + } + id, e := utils.ParseInstanceID(o.(*karpv1.NodeClaim).Status.ProviderID) + if e != nil || id == "" { + return nil + } + return []string{id} + }), "failed to setup nodeclaim instanceID indexer") + lo.Must0(mgr.GetFieldIndexer().IndexField(ctx, &corev1.Node{}, "spec.instanceID", func(o client.Object) []string { + if o.(*corev1.Node).Spec.ProviderID == "" { + return nil + } + id, e := utils.ParseInstanceID(o.(*corev1.Node).Spec.ProviderID) + if e != nil || id == "" { + return nil + } + return []string{id} + }), "failed to setup node instanceID indexer") +} diff --git a/pkg/test/environment.go b/pkg/test/environment.go index 2d9a9d243083..a03b6081fb33 100644 --- a/pkg/test/environment.go +++ b/pkg/test/environment.go @@ -23,6 +23,8 @@ import ( "github.com/samber/lo" corev1 "k8s.io/api/core/v1" clock "k8s.io/utils/clock/testing" + ctrlcache "sigs.k8s.io/controller-runtime/pkg/cache" + "sigs.k8s.io/controller-runtime/pkg/client" karpv1 "sigs.k8s.io/karpenter/pkg/apis/v1" @@ -39,6 +41,7 @@ import ( ssmp "github.com/aws/karpenter-provider-aws/pkg/providers/ssm" "github.com/aws/karpenter-provider-aws/pkg/providers/subnet" "github.com/aws/karpenter-provider-aws/pkg/providers/version" + "github.com/aws/karpenter-provider-aws/pkg/utils" coretest "sigs.k8s.io/karpenter/pkg/test" @@ -216,3 +219,33 @@ func (env *Environment) Reset() { } } } + +func NodeInstanceIDFieldIndexer(ctx context.Context) func(ctrlcache.Cache) error { + return func(c ctrlcache.Cache) error { + return c.IndexField(ctx, &corev1.Node{}, "spec.instanceID", func(obj client.Object) []string { + if obj.(*corev1.Node).Spec.ProviderID == "" { + return nil + } + id, e := utils.ParseInstanceID(obj.(*corev1.Node).Spec.ProviderID) + if e != nil || id == "" { + return nil + } + return []string{id} + }) + } +} + +func NodeClaimInstanceIDFieldIndexer(ctx context.Context) func(ctrlcache.Cache) error { + return func(c ctrlcache.Cache) error { + return c.IndexField(ctx, &karpv1.NodeClaim{}, "status.instanceID", func(obj client.Object) []string { + if obj.(*karpv1.NodeClaim).Status.ProviderID == "" { + return nil + } + id, e := utils.ParseInstanceID(obj.(*karpv1.NodeClaim).Status.ProviderID) + if e != nil || id == "" { + return nil + } + return []string{id} + }) + } +} From 2cb43bc468d04f2263d43109e0ccc7cbe616e226 Mon Sep 17 00:00:00 2001 From: Jonathan Innis Date: Fri, 7 Feb 2025 23:22:36 -0800 Subject: [PATCH 22/34] chore: Add CreateError when launch template isn't found (#7711) --- go.mod | 2 +- go.sum | 4 ++-- pkg/providers/instance/instance.go | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/go.mod b/go.mod index 032991aa0d47..154f342052f0 100644 --- a/go.mod +++ b/go.mod @@ -43,7 +43,7 @@ require ( k8s.io/klog/v2 v2.130.1 k8s.io/utils v0.0.0-20241104100929-3ea5e8cea738 sigs.k8s.io/controller-runtime v0.20.1 - sigs.k8s.io/karpenter v1.2.1-0.20250207011955-403034a0cbd9 + sigs.k8s.io/karpenter v1.2.1-0.20250208015555-8e8b99d6bfa2 sigs.k8s.io/yaml v1.4.0 ) diff --git a/go.sum b/go.sum index 762b0d75c328..125ae461e52a 100644 --- a/go.sum +++ b/go.sum @@ -336,8 +336,8 @@ sigs.k8s.io/controller-runtime v0.20.1 h1:JbGMAG/X94NeM3xvjenVUaBjy6Ui4Ogd/J5Ztj sigs.k8s.io/controller-runtime v0.20.1/go.mod h1:BrP3w158MwvB3ZbNpaAcIKkHQ7YGpYnzpoSTZ8E14WU= sigs.k8s.io/json v0.0.0-20241010143419-9aa6b5e7a4b3 h1:/Rv+M11QRah1itp8VhT6HoVx1Ray9eB4DBr+K+/sCJ8= sigs.k8s.io/json v0.0.0-20241010143419-9aa6b5e7a4b3/go.mod h1:18nIHnGi6636UCz6m8i4DhaJ65T6EruyzmoQqI2BVDo= -sigs.k8s.io/karpenter v1.2.1-0.20250207011955-403034a0cbd9 h1:/phqkLkjx+iIPoUpFzZQBzGAEYlDmFvgXrFjeH/Cw1M= -sigs.k8s.io/karpenter v1.2.1-0.20250207011955-403034a0cbd9/go.mod h1:S+qNY3XwugJTu+UvgAdeNUxWuwQP/gS0uefdrV5wFLE= +sigs.k8s.io/karpenter v1.2.1-0.20250208015555-8e8b99d6bfa2 h1:E8ZbRdDrRfAaNgLgOl3qkBGMyKOoDTb7grYEwV6+FBQ= +sigs.k8s.io/karpenter v1.2.1-0.20250208015555-8e8b99d6bfa2/go.mod h1:S+qNY3XwugJTu+UvgAdeNUxWuwQP/gS0uefdrV5wFLE= sigs.k8s.io/structured-merge-diff/v4 v4.4.2 h1:MdmvkGuXi/8io6ixD5wud3vOLwc1rj0aNqRlpuvjmwA= sigs.k8s.io/structured-merge-diff/v4 v4.4.2/go.mod h1:N8f93tFZh9U6vpxwRArLiikrE5/2tiu1w1AGfACIGE4= sigs.k8s.io/yaml v1.4.0 h1:Mk1wCc2gy/F0THH0TAp1QYyJNzRm2KCLy3o5ASXVI5E= diff --git a/pkg/providers/instance/instance.go b/pkg/providers/instance/instance.go index 34f53dc28210..69f8c8e8fa16 100644 --- a/pkg/providers/instance/instance.go +++ b/pkg/providers/instance/instance.go @@ -249,13 +249,13 @@ func (p *DefaultProvider) launchInstance(ctx context.Context, nodeClass *v1.EC2N createFleetOutput, err := p.ec2Batcher.CreateFleet(ctx, createFleetInput) p.subnetProvider.UpdateInflightIPs(createFleetInput, createFleetOutput, instanceTypes, lo.Values(zonalSubnets), capacityType) if err != nil { + reason, message := awserrors.ToReasonMessage(err) if awserrors.IsLaunchTemplateNotFound(err) { for _, lt := range launchTemplateConfigs { p.launchTemplateProvider.InvalidateCache(ctx, aws.ToString(lt.LaunchTemplateSpecification.LaunchTemplateName), aws.ToString(lt.LaunchTemplateSpecification.LaunchTemplateId)) } - return ec2types.CreateFleetInstance{}, fmt.Errorf("creating fleet %w", err) + return ec2types.CreateFleetInstance{}, cloudprovider.NewCreateError(fmt.Errorf("launch templates not found when creating fleet request, %w", err), reason, fmt.Sprintf("Launch templates not found when creating fleet request: %s", message)) } - reason, message := awserrors.ToReasonMessage(err) var reqErr *awshttp.ResponseError if errors.As(err, &reqErr) { return ec2types.CreateFleetInstance{}, cloudprovider.NewCreateError(fmt.Errorf("creating fleet request, %w (%v)", err, reqErr.ServiceRequestID()), reason, fmt.Sprintf("Error creating fleet request: %s", message)) From 25aa6b28c479d3d183b20de58c7af1d33530b6db Mon Sep 17 00:00:00 2001 From: Jason Deal Date: Mon, 10 Feb 2025 12:02:36 -0800 Subject: [PATCH 23/34] docs: update pod level controls for TGP (#7710) --- .../content/en/docs/concepts/disruption.md | 111 ++++++++++++------ .../content/en/preview/concepts/disruption.md | 111 ++++++++++++------ .../content/en/v1.0/concepts/disruption.md | 92 ++++++++++----- .../content/en/v1.1/concepts/disruption.md | 88 +++++++++----- .../content/en/v1.2/concepts/disruption.md | 111 ++++++++++++------ 5 files changed, 340 insertions(+), 173 deletions(-) diff --git a/website/content/en/docs/concepts/disruption.md b/website/content/en/docs/concepts/disruption.md index df281154c9b9..85e173bec923 100644 --- a/website/content/en/docs/concepts/disruption.md +++ b/website/content/en/docs/concepts/disruption.md @@ -70,18 +70,14 @@ Automated graceful methods, can be rate limited through [NodePool Disruption Bud * Nodes can be removed as their workloads will run on other nodes in the cluster. * Nodes can be replaced with lower priced variants due to a change in the workloads. * [**Drift**]({{}}): Karpenter will mark nodes as drifted and disrupt nodes that have drifted from their desired specification. See [Drift]({{}}) to see which fields are considered. -* [**Interruption**]({{}}): Karpenter will watch for upcoming interruption events that could affect your nodes (health events, spot interruption, etc.) and will taint, drain, and terminate the node(s) ahead of the event to reduce workload disruption. {{% alert title="Defaults" color="secondary" %}} -Disruption is configured through the NodePool's disruption block by the `consolidationPolicy`, and `consolidateAfter` fields. `expireAfter` can also be used to control disruption. Karpenter will configure these fields with the following values by default if they are not set: +Disruption is configured through the NodePool's disruption block by the `consolidationPolicy`, and `consolidateAfter` fields. Karpenter will configure these fields with the following values by default if they are not set: ```yaml spec: disruption: consolidationPolicy: WhenEmptyOrUnderutilized - template: - spec: - expireAfter: 720h ``` {{% /alert %}} @@ -169,10 +165,22 @@ Karpenter will add the `Drifted` status condition on NodeClaims if the NodeClaim ## Automated Forceful Methods -Automated forceful methods will begin draining nodes as soon as the condition is met. Note that these methods blow past NodePool Disruption Budgets, and do not wait for a pre-spin replacement node to be healthy for the pods to reschedule, unlike the graceful methods mentioned above. Use Pod Disruption Budgets and `do-not-disrupt` on your nodes to rate-limit the speed at which your applications are disrupted. +Automated forceful methods will begin draining nodes as soon as the condition is met. +Unlike the graceful methods mentioned above, these methods can not be rate-limited using [NodePool Disruption Budgets](#nodepool-disruption-budgets), and do not wait for a pre-spin replacement node to be healthy for the pods to reschedule. +Pod disruption budgets may be used to rate-limit application disruption. ### Expiration -Karpenter will disrupt nodes as soon as they're expired after they've lived for the duration of the NodePool's `spec.template.spec.expireAfter`. You can use expiration to periodically recycle nodes due to security concern. + +A node is expired once it's lifetime exceeds the duration set on the owning NodeClaim's `spec.expireAfter` field. +Changes to `spec.template.spec.expireAfter` on the owning NodePool will not update the field for existing NodeClaims - it will induce NodeClaim drift and the replacements will have the updated value. +Expiration can be used, in conjunction with [`terminationGracePeriod`](#termination-grace-period), to enforce a maximum Node lifetime. +By default, `expireAfter` is set to `720h` (30 days). + +{{% alert title="Warning" color="warning" %}} +Misconfigured PDBs and pods with the `karpenter.sh/do-not-disrupt` annotation may block draining indefinitely. +For this reason, it is not recommended to set `expireAfter` without also setting `terminationGracePeriod` **if** your cluster has pods with the `karpenter.sh/do-not-disrupt` annotation. +Doing so can result in partially drained nodes stuck in the cluster, driving up cluster cost and potentially requiring manual intervention to resolve. +{{% /alert %}} ### Interruption @@ -197,13 +205,13 @@ Karpenter enables this feature by watching an SQS queue which receives critical To enable interruption handling, configure the `--interruption-queue` CLI argument with the name of the interruption queue provisioned to handle interruption events. -### Node Auto Repair +### Node Auto Repair Feature State: Karpenter v1.1.0 [alpha]({{}}) Node Auto Repair is a feature that automatically identifies and replaces unhealthy nodes in your cluster, helping to maintain overall cluster health. Nodes can experience various types of failures affecting their hardware, file systems, or container environments. These failures may be surfaced through node conditions such as network unavailability, disk pressure, memory pressure, or other conditions reported by node diagnostic agents. When Karpenter detects these unhealthy conditions, it automatically replaces the affected nodes based on cloud provider-defined repair policies. Once a node has been in an unhealthy state beyond its configured toleration duration, Karpenter will forcefully terminate the node and its corresponding NodeClaim, bypassing the standard drain and grace period procedures to ensure swift replacement of problematic nodes. To prevent cascading failures, Karpenter includes safety mechanisms: it will not perform repairs if more than 20% of nodes in a NodePool are unhealthy, and for standalone NodeClaims, it evaluates this threshold against all nodes in the cluster. This ensures your cluster remains in a healthy state with minimal manual intervention, even in scenarios where normal node termination procedures might be impacted by the node's unhealthy state. -To enable Node Auto Repair: +To enable Node Auto Repair: 1. Ensure you have a [Node Monitoring Agent](https://docs.aws.amazon.com/en_us/eks/latest/userguide/node-health.html) deployed or any agent that will add status conditions to nodes that are supported (e.g., Node Problem Detector) 2. Enable the feature flag: `NodeRepair=true` 3. Node AutoRepair will automatically terminate nodes when they have unhealthy status conditions based on your cloud provider's repair policies @@ -214,36 +222,58 @@ Karpenter monitors nodes for the following node status conditions when initiatin #### Kubelet Node Conditions -| Type | Status | Toleration Duration | +| Type | Status | Toleration Duration | | ------ | ------------- | ------------------- | | Ready | False | 30 minutes | -| Ready | Unknown | 30 minutes | +| Ready | Unknown | 30 minutes | #### Node Monitoring Agent Conditions -| Type | Status | Toleration Duration | +| Type | Status | Toleration Duration | | ------------------------ | ------------| --------------------- | | AcceleratedHardwareReady | False | 10 minutes | -| StorageReady | False | 30 minutes | -| NetworkingReady | False | 30 minutes | -| KernelReady | False | 30 minutes | -| ContainerRuntimeReady | False | 30 minutes | +| StorageReady | False | 30 minutes | +| NetworkingReady | False | 30 minutes | +| KernelReady | False | 30 minutes | +| ContainerRuntimeReady | False | 30 minutes | To enable the drift feature flag, refer to the [Feature Gates]({{}}). ## Controls -### TerminationGracePeriod +### TerminationGracePeriod -You can set a NodePool's `terminationGracePeriod` through the `spec.template.spec.terminationGracePeriod` field. This field defines the duration of time that a node can be draining before it's forcibly deleted. A node begins draining when it's deleted. Pods will be deleted preemptively based on its TerminationGracePeriodSeconds before this terminationGracePeriod ends to give as much time to cleanup as possible. Note that if your pod's terminationGracePeriodSeconds is larger than this terminationGracePeriod, Karpenter may forcibly delete the pod before it has its full terminationGracePeriod to cleanup. +To configure a maximum termination duration, `terminationGracePeriod` should be used. +It is configured through a NodePool's [`spec.template.spec.terminationGracePeriod`]({{}}) field, and is persisted to created NodeClaims (`spec.terminationGracePeriod`). +Changes to the [`spec.template.spec.terminationGracePeriod`]({{}}) field on the NodePool will not result in a change for existing NodeClaims - it will induce NodeClaim drift and the replacements will have the updated `terminationGracePeriod`. -This is especially useful in combination with `nodepool.spec.template.spec.expireAfter` to define an absolute maximum on the lifetime of a node, where a node is deleted at `expireAfter` and finishes draining within the `terminationGracePeriod` thereafter. Pods blocking eviction like PDBs and do-not-disrupt will block full draining until the `terminationGracePeriod` is reached. +Once a node is disrupted, via either a [graceful](#automated-graceful-methods) or [forceful](#automated-forceful-methods) disruption method, Karpenter will being draining the node. +At this point, the countdown for `terminationGracePeriod` begins. +Once the `terminationGracePeriod` elapses, remaining pods will be forcibly deleted and the unerlying instance will be terminated. +A node may be terminated before the `terminationGracePeriod` has elapsed if all disruptable pods have been drained. + +In conjunction with `expireAfter`, `terminationGracePeriod` can be used to enforce an absolute maximum node lifetime. +The node will begin to drain once its `expireAfter` has elapsed, and it will be forcibly terminated once its `terminationGracePeriod` has elapsed, making the maximum node lifetime the sum of the two fields. + +Additionally, configuring `terminationGracePeriod` changes the eligibility criteria for disruption via `Drift`. +When configured, a node may be disrupted via drift even if there are pods with blocking PDBs or the `karpenter.sh/do-not-disrupt` annotation scheduled to it. +This enables cluster administrators to ensure crucial updates (e.g. AMI updates addressing CVEs) can't be blocked by misconfigured applications. + +{{% alert title="Warning" color="warning" %}} +To ensure that the `terminationGracePeriodSeconds` value for draining pods is respected, pods will be preemptively deleted before the Node's `terminationGracePeriod` has elapsed. +This includes pods with blocking [pod disruption budgets](https://kubernetes.io/docs/tasks/run-application/configure-pdb/) or the [`karpenter.sh/do-not-disrupt` annotation]({{}}). -For instance, a NodeClaim with `terminationGracePeriod` set to `1h` and an `expireAfter` set to `23h` will begin draining after it's lived for `23h`. Let's say a `do-not-disrupt` pod has `TerminationGracePeriodSeconds` set to `300` seconds. If the node hasn't been fully drained after `55m`, Karpenter will delete the pod to allow it's full `terminationGracePeriodSeconds` to cleanup. If no pods are blocking draining, Karpenter will cleanup the node as soon as the node is fully drained, rather than waiting for the NodeClaim's `terminationGracePeriod` to finish. +Consider the following example: a Node with a 1 hour `terminationGracePeriod` has been disrupted and begins to drain. +A pod with the `karpenter.sh/do-not-disrupt` annotation and a 300 second (5 minute) `terminationGracePeriodsSeconds` is scheduled to it. +If the pod is still running 55 minutes after the Node begins to drain, the pod will be deleted to ensure its `terminationGracePeriodSeconds` value is respected. + +If a pod's `terminationGracePeriodSeconds` value exceeds that of the Node it is scheduled to, Karpenter will prioritize the Node's `terminationGracePeriod`. +The pod will be deleted as soon as the Node begins to drain, and it will not receive it's full `terminationGracePeriodSeconds`. +{{% /alert %}} ### NodePool Disruption Budgets -You can rate limit Karpenter's disruption through the NodePool's `spec.disruption.budgets`. If undefined, Karpenter will default to one budget with `nodes: 10%`. Budgets will consider nodes that are actively being deleted for any reason, and will only block Karpenter from disrupting nodes voluntarily through drift, emptiness, and consolidation. Note that NodePool Disruption Budgets do not prevent Karpenter from terminating expired nodes. +You can rate limit Karpenter's disruption through the NodePool's `spec.disruption.budgets`. If undefined, Karpenter will default to one budget with `nodes: 10%`. Budgets will consider nodes that are actively being deleted for any reason, and will only block Karpenter from disrupting nodes voluntarily through drift, emptiness, and consolidation. Note that NodePool Disruption Budgets do not prevent Karpenter from terminating expired nodes. #### Reasons Karpenter allows specifying if a budget applies to any of `Drifted`, `Underutilized`, or `Empty`. When a budget has no reasons, it's assumed that it applies to all reasons. When calculating allowed disruptions for a given reason, Karpenter will take the minimum of the budgets that have listed the reason or have left reasons undefined. @@ -256,7 +286,7 @@ If the budget is configured with a percentage value, such as `20%`, Karpenter wi For example, the following NodePool with three budgets defines the following requirements: - The first budget will only allow 20% of nodes owned by that NodePool to be disrupted if it's empty or drifted. For instance, if there were 19 nodes owned by the NodePool, 4 empty or drifted nodes could be disrupted, rounding up from `19 * .2 = 3.8`. - The second budget acts as a ceiling to the previous budget, only allowing 5 disruptions when there are more than 25 nodes. -- The last budget only blocks disruptions during the first 10 minutes of the day, where 0 disruptions are allowed, only applying to underutilized nodes. +- The last budget only blocks disruptions during the first 10 minutes of the day, where 0 disruptions are allowed, only applying to underutilized nodes. ```yaml apiVersion: karpenter.sh/v1 @@ -264,21 +294,18 @@ kind: NodePool metadata: name: default spec: - template: - spec: - expireAfter: 720h # 30 * 24h = 720h disruption: consolidationPolicy: WhenEmptyOrUnderutilized budgets: - nodes: "20%" - reasons: + reasons: - "Empty" - "Drifted" - nodes: "5" - nodes: "0" schedule: "@daily" duration: 10m - reasons: + reasons: - "Underutilized" ``` @@ -307,8 +334,18 @@ Duration and Schedule must be defined together. When omitted, the budget is alwa ### Pod-Level Controls -You can block Karpenter from voluntarily choosing to disrupt certain pods by setting the `karpenter.sh/do-not-disrupt: "true"` annotation on the pod. This is useful for pods that you want to run from start to finish without disruption. By opting pods out of this disruption, you are telling Karpenter that it should not voluntarily remove a node containing this pod. - +You can block Karpenter from voluntarily disrupting and draining pods by adding the `karpenter.sh/do-not-disrupt: "true"` annotation to the pod. +You can treat this annotation as a single-pod, permanently blocking PDB. +This has the following consequences: +- Nodes with `karpenter.sh/do-not-disrupt` pods will be excluded from [Consolidation]({{}}), and conditionally excluded from [Drift]({{}}). + - If the Node's owning NodeClaim has a [`terminationGracePeriod`]({{}}) configured, it will still be eligible for disruption via drift. +- Like pods with a blocking PDB, pods with the `karpenter.sh/do-not-disrupt` annotation will **not** be gracefully evicted by the [Termination Controller]({{ref "#terminationcontroller"}}). + Karpenter will not be able to complete termination of the node until one of the following conditions is met: + - All pods with the `karpenter.sh/do-not-disrupt` annotation are removed. + - All pods with the `karpenter.sh/do-not-disrupt` annotation have entered a [terminal phase](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase) (`Succeeded` or `Failed`). + - The owning NodeClaim's [`terminationGracePeriod`]({{}}) has elapsed. + +This is useful for pods that you want to run from start to finish without disruption. Examples of pods that you might want to opt-out of disruption include an interactive game that you don't want to interrupt or a long batch job (such as you might have with machine learning) that would need to start over if it were interrupted. ```yaml @@ -322,20 +359,16 @@ spec: ``` {{% alert title="Note" color="primary" %}} -This annotation will be ignored for [terminating pods](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase) and [terminal pods](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase) (Failed/Succeeded). -{{% /alert %}} - -Examples of voluntary node removal that will be prevented by this annotation include: -- [Consolidation]({{}}) -- [Drift]({{}}) - -{{% alert title="Note" color="primary" %}} -Voluntary node removal does not include [Interruption]({{}}) or manual deletion initiated through `kubectl delete node`. Both of these are considered involuntary events, since node removal cannot be delayed. +The `karpenter.sh/do-not-disrupt` annotation does **not** exclude nodes from the forceful disruption methods: [Expiration]({{}}), [Interruption]({{}}), [Node Repair](), and manual deletion (e.g. `kubectl delete node ...`). +While both interruption and node repair have implicit upper-bounds on termination time, expiration and manual termination do not. +Manual intervention may be required to unblock node termination, by removing pods with the `karpenter.sh/do-not-disrupt` annotation. +For this reason, it is not recommended to use the `karpenter.sh/do-not-disrupt` annotation with `expireAfter` **if** you have not also configured `terminationGracePeriod`. {{% /alert %}} ### Node-Level Controls -You can block Karpenter from voluntarily choosing to disrupt certain nodes by setting the `karpenter.sh/do-not-disrupt: "true"` annotation on the node. This will prevent disruption actions on the node. +You can block Karpenter from voluntarily choosing to disrupt certain nodes by setting the `karpenter.sh/do-not-disrupt: "true"` annotation on the node. +This will prevent voluntary disruption actions against the node. ```yaml apiVersion: v1 diff --git a/website/content/en/preview/concepts/disruption.md b/website/content/en/preview/concepts/disruption.md index df281154c9b9..85e173bec923 100644 --- a/website/content/en/preview/concepts/disruption.md +++ b/website/content/en/preview/concepts/disruption.md @@ -70,18 +70,14 @@ Automated graceful methods, can be rate limited through [NodePool Disruption Bud * Nodes can be removed as their workloads will run on other nodes in the cluster. * Nodes can be replaced with lower priced variants due to a change in the workloads. * [**Drift**]({{}}): Karpenter will mark nodes as drifted and disrupt nodes that have drifted from their desired specification. See [Drift]({{}}) to see which fields are considered. -* [**Interruption**]({{}}): Karpenter will watch for upcoming interruption events that could affect your nodes (health events, spot interruption, etc.) and will taint, drain, and terminate the node(s) ahead of the event to reduce workload disruption. {{% alert title="Defaults" color="secondary" %}} -Disruption is configured through the NodePool's disruption block by the `consolidationPolicy`, and `consolidateAfter` fields. `expireAfter` can also be used to control disruption. Karpenter will configure these fields with the following values by default if they are not set: +Disruption is configured through the NodePool's disruption block by the `consolidationPolicy`, and `consolidateAfter` fields. Karpenter will configure these fields with the following values by default if they are not set: ```yaml spec: disruption: consolidationPolicy: WhenEmptyOrUnderutilized - template: - spec: - expireAfter: 720h ``` {{% /alert %}} @@ -169,10 +165,22 @@ Karpenter will add the `Drifted` status condition on NodeClaims if the NodeClaim ## Automated Forceful Methods -Automated forceful methods will begin draining nodes as soon as the condition is met. Note that these methods blow past NodePool Disruption Budgets, and do not wait for a pre-spin replacement node to be healthy for the pods to reschedule, unlike the graceful methods mentioned above. Use Pod Disruption Budgets and `do-not-disrupt` on your nodes to rate-limit the speed at which your applications are disrupted. +Automated forceful methods will begin draining nodes as soon as the condition is met. +Unlike the graceful methods mentioned above, these methods can not be rate-limited using [NodePool Disruption Budgets](#nodepool-disruption-budgets), and do not wait for a pre-spin replacement node to be healthy for the pods to reschedule. +Pod disruption budgets may be used to rate-limit application disruption. ### Expiration -Karpenter will disrupt nodes as soon as they're expired after they've lived for the duration of the NodePool's `spec.template.spec.expireAfter`. You can use expiration to periodically recycle nodes due to security concern. + +A node is expired once it's lifetime exceeds the duration set on the owning NodeClaim's `spec.expireAfter` field. +Changes to `spec.template.spec.expireAfter` on the owning NodePool will not update the field for existing NodeClaims - it will induce NodeClaim drift and the replacements will have the updated value. +Expiration can be used, in conjunction with [`terminationGracePeriod`](#termination-grace-period), to enforce a maximum Node lifetime. +By default, `expireAfter` is set to `720h` (30 days). + +{{% alert title="Warning" color="warning" %}} +Misconfigured PDBs and pods with the `karpenter.sh/do-not-disrupt` annotation may block draining indefinitely. +For this reason, it is not recommended to set `expireAfter` without also setting `terminationGracePeriod` **if** your cluster has pods with the `karpenter.sh/do-not-disrupt` annotation. +Doing so can result in partially drained nodes stuck in the cluster, driving up cluster cost and potentially requiring manual intervention to resolve. +{{% /alert %}} ### Interruption @@ -197,13 +205,13 @@ Karpenter enables this feature by watching an SQS queue which receives critical To enable interruption handling, configure the `--interruption-queue` CLI argument with the name of the interruption queue provisioned to handle interruption events. -### Node Auto Repair +### Node Auto Repair Feature State: Karpenter v1.1.0 [alpha]({{}}) Node Auto Repair is a feature that automatically identifies and replaces unhealthy nodes in your cluster, helping to maintain overall cluster health. Nodes can experience various types of failures affecting their hardware, file systems, or container environments. These failures may be surfaced through node conditions such as network unavailability, disk pressure, memory pressure, or other conditions reported by node diagnostic agents. When Karpenter detects these unhealthy conditions, it automatically replaces the affected nodes based on cloud provider-defined repair policies. Once a node has been in an unhealthy state beyond its configured toleration duration, Karpenter will forcefully terminate the node and its corresponding NodeClaim, bypassing the standard drain and grace period procedures to ensure swift replacement of problematic nodes. To prevent cascading failures, Karpenter includes safety mechanisms: it will not perform repairs if more than 20% of nodes in a NodePool are unhealthy, and for standalone NodeClaims, it evaluates this threshold against all nodes in the cluster. This ensures your cluster remains in a healthy state with minimal manual intervention, even in scenarios where normal node termination procedures might be impacted by the node's unhealthy state. -To enable Node Auto Repair: +To enable Node Auto Repair: 1. Ensure you have a [Node Monitoring Agent](https://docs.aws.amazon.com/en_us/eks/latest/userguide/node-health.html) deployed or any agent that will add status conditions to nodes that are supported (e.g., Node Problem Detector) 2. Enable the feature flag: `NodeRepair=true` 3. Node AutoRepair will automatically terminate nodes when they have unhealthy status conditions based on your cloud provider's repair policies @@ -214,36 +222,58 @@ Karpenter monitors nodes for the following node status conditions when initiatin #### Kubelet Node Conditions -| Type | Status | Toleration Duration | +| Type | Status | Toleration Duration | | ------ | ------------- | ------------------- | | Ready | False | 30 minutes | -| Ready | Unknown | 30 minutes | +| Ready | Unknown | 30 minutes | #### Node Monitoring Agent Conditions -| Type | Status | Toleration Duration | +| Type | Status | Toleration Duration | | ------------------------ | ------------| --------------------- | | AcceleratedHardwareReady | False | 10 minutes | -| StorageReady | False | 30 minutes | -| NetworkingReady | False | 30 minutes | -| KernelReady | False | 30 minutes | -| ContainerRuntimeReady | False | 30 minutes | +| StorageReady | False | 30 minutes | +| NetworkingReady | False | 30 minutes | +| KernelReady | False | 30 minutes | +| ContainerRuntimeReady | False | 30 minutes | To enable the drift feature flag, refer to the [Feature Gates]({{}}). ## Controls -### TerminationGracePeriod +### TerminationGracePeriod -You can set a NodePool's `terminationGracePeriod` through the `spec.template.spec.terminationGracePeriod` field. This field defines the duration of time that a node can be draining before it's forcibly deleted. A node begins draining when it's deleted. Pods will be deleted preemptively based on its TerminationGracePeriodSeconds before this terminationGracePeriod ends to give as much time to cleanup as possible. Note that if your pod's terminationGracePeriodSeconds is larger than this terminationGracePeriod, Karpenter may forcibly delete the pod before it has its full terminationGracePeriod to cleanup. +To configure a maximum termination duration, `terminationGracePeriod` should be used. +It is configured through a NodePool's [`spec.template.spec.terminationGracePeriod`]({{}}) field, and is persisted to created NodeClaims (`spec.terminationGracePeriod`). +Changes to the [`spec.template.spec.terminationGracePeriod`]({{}}) field on the NodePool will not result in a change for existing NodeClaims - it will induce NodeClaim drift and the replacements will have the updated `terminationGracePeriod`. -This is especially useful in combination with `nodepool.spec.template.spec.expireAfter` to define an absolute maximum on the lifetime of a node, where a node is deleted at `expireAfter` and finishes draining within the `terminationGracePeriod` thereafter. Pods blocking eviction like PDBs and do-not-disrupt will block full draining until the `terminationGracePeriod` is reached. +Once a node is disrupted, via either a [graceful](#automated-graceful-methods) or [forceful](#automated-forceful-methods) disruption method, Karpenter will being draining the node. +At this point, the countdown for `terminationGracePeriod` begins. +Once the `terminationGracePeriod` elapses, remaining pods will be forcibly deleted and the unerlying instance will be terminated. +A node may be terminated before the `terminationGracePeriod` has elapsed if all disruptable pods have been drained. + +In conjunction with `expireAfter`, `terminationGracePeriod` can be used to enforce an absolute maximum node lifetime. +The node will begin to drain once its `expireAfter` has elapsed, and it will be forcibly terminated once its `terminationGracePeriod` has elapsed, making the maximum node lifetime the sum of the two fields. + +Additionally, configuring `terminationGracePeriod` changes the eligibility criteria for disruption via `Drift`. +When configured, a node may be disrupted via drift even if there are pods with blocking PDBs or the `karpenter.sh/do-not-disrupt` annotation scheduled to it. +This enables cluster administrators to ensure crucial updates (e.g. AMI updates addressing CVEs) can't be blocked by misconfigured applications. + +{{% alert title="Warning" color="warning" %}} +To ensure that the `terminationGracePeriodSeconds` value for draining pods is respected, pods will be preemptively deleted before the Node's `terminationGracePeriod` has elapsed. +This includes pods with blocking [pod disruption budgets](https://kubernetes.io/docs/tasks/run-application/configure-pdb/) or the [`karpenter.sh/do-not-disrupt` annotation]({{}}). -For instance, a NodeClaim with `terminationGracePeriod` set to `1h` and an `expireAfter` set to `23h` will begin draining after it's lived for `23h`. Let's say a `do-not-disrupt` pod has `TerminationGracePeriodSeconds` set to `300` seconds. If the node hasn't been fully drained after `55m`, Karpenter will delete the pod to allow it's full `terminationGracePeriodSeconds` to cleanup. If no pods are blocking draining, Karpenter will cleanup the node as soon as the node is fully drained, rather than waiting for the NodeClaim's `terminationGracePeriod` to finish. +Consider the following example: a Node with a 1 hour `terminationGracePeriod` has been disrupted and begins to drain. +A pod with the `karpenter.sh/do-not-disrupt` annotation and a 300 second (5 minute) `terminationGracePeriodsSeconds` is scheduled to it. +If the pod is still running 55 minutes after the Node begins to drain, the pod will be deleted to ensure its `terminationGracePeriodSeconds` value is respected. + +If a pod's `terminationGracePeriodSeconds` value exceeds that of the Node it is scheduled to, Karpenter will prioritize the Node's `terminationGracePeriod`. +The pod will be deleted as soon as the Node begins to drain, and it will not receive it's full `terminationGracePeriodSeconds`. +{{% /alert %}} ### NodePool Disruption Budgets -You can rate limit Karpenter's disruption through the NodePool's `spec.disruption.budgets`. If undefined, Karpenter will default to one budget with `nodes: 10%`. Budgets will consider nodes that are actively being deleted for any reason, and will only block Karpenter from disrupting nodes voluntarily through drift, emptiness, and consolidation. Note that NodePool Disruption Budgets do not prevent Karpenter from terminating expired nodes. +You can rate limit Karpenter's disruption through the NodePool's `spec.disruption.budgets`. If undefined, Karpenter will default to one budget with `nodes: 10%`. Budgets will consider nodes that are actively being deleted for any reason, and will only block Karpenter from disrupting nodes voluntarily through drift, emptiness, and consolidation. Note that NodePool Disruption Budgets do not prevent Karpenter from terminating expired nodes. #### Reasons Karpenter allows specifying if a budget applies to any of `Drifted`, `Underutilized`, or `Empty`. When a budget has no reasons, it's assumed that it applies to all reasons. When calculating allowed disruptions for a given reason, Karpenter will take the minimum of the budgets that have listed the reason or have left reasons undefined. @@ -256,7 +286,7 @@ If the budget is configured with a percentage value, such as `20%`, Karpenter wi For example, the following NodePool with three budgets defines the following requirements: - The first budget will only allow 20% of nodes owned by that NodePool to be disrupted if it's empty or drifted. For instance, if there were 19 nodes owned by the NodePool, 4 empty or drifted nodes could be disrupted, rounding up from `19 * .2 = 3.8`. - The second budget acts as a ceiling to the previous budget, only allowing 5 disruptions when there are more than 25 nodes. -- The last budget only blocks disruptions during the first 10 minutes of the day, where 0 disruptions are allowed, only applying to underutilized nodes. +- The last budget only blocks disruptions during the first 10 minutes of the day, where 0 disruptions are allowed, only applying to underutilized nodes. ```yaml apiVersion: karpenter.sh/v1 @@ -264,21 +294,18 @@ kind: NodePool metadata: name: default spec: - template: - spec: - expireAfter: 720h # 30 * 24h = 720h disruption: consolidationPolicy: WhenEmptyOrUnderutilized budgets: - nodes: "20%" - reasons: + reasons: - "Empty" - "Drifted" - nodes: "5" - nodes: "0" schedule: "@daily" duration: 10m - reasons: + reasons: - "Underutilized" ``` @@ -307,8 +334,18 @@ Duration and Schedule must be defined together. When omitted, the budget is alwa ### Pod-Level Controls -You can block Karpenter from voluntarily choosing to disrupt certain pods by setting the `karpenter.sh/do-not-disrupt: "true"` annotation on the pod. This is useful for pods that you want to run from start to finish without disruption. By opting pods out of this disruption, you are telling Karpenter that it should not voluntarily remove a node containing this pod. - +You can block Karpenter from voluntarily disrupting and draining pods by adding the `karpenter.sh/do-not-disrupt: "true"` annotation to the pod. +You can treat this annotation as a single-pod, permanently blocking PDB. +This has the following consequences: +- Nodes with `karpenter.sh/do-not-disrupt` pods will be excluded from [Consolidation]({{}}), and conditionally excluded from [Drift]({{}}). + - If the Node's owning NodeClaim has a [`terminationGracePeriod`]({{}}) configured, it will still be eligible for disruption via drift. +- Like pods with a blocking PDB, pods with the `karpenter.sh/do-not-disrupt` annotation will **not** be gracefully evicted by the [Termination Controller]({{ref "#terminationcontroller"}}). + Karpenter will not be able to complete termination of the node until one of the following conditions is met: + - All pods with the `karpenter.sh/do-not-disrupt` annotation are removed. + - All pods with the `karpenter.sh/do-not-disrupt` annotation have entered a [terminal phase](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase) (`Succeeded` or `Failed`). + - The owning NodeClaim's [`terminationGracePeriod`]({{}}) has elapsed. + +This is useful for pods that you want to run from start to finish without disruption. Examples of pods that you might want to opt-out of disruption include an interactive game that you don't want to interrupt or a long batch job (such as you might have with machine learning) that would need to start over if it were interrupted. ```yaml @@ -322,20 +359,16 @@ spec: ``` {{% alert title="Note" color="primary" %}} -This annotation will be ignored for [terminating pods](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase) and [terminal pods](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase) (Failed/Succeeded). -{{% /alert %}} - -Examples of voluntary node removal that will be prevented by this annotation include: -- [Consolidation]({{}}) -- [Drift]({{}}) - -{{% alert title="Note" color="primary" %}} -Voluntary node removal does not include [Interruption]({{}}) or manual deletion initiated through `kubectl delete node`. Both of these are considered involuntary events, since node removal cannot be delayed. +The `karpenter.sh/do-not-disrupt` annotation does **not** exclude nodes from the forceful disruption methods: [Expiration]({{}}), [Interruption]({{}}), [Node Repair](), and manual deletion (e.g. `kubectl delete node ...`). +While both interruption and node repair have implicit upper-bounds on termination time, expiration and manual termination do not. +Manual intervention may be required to unblock node termination, by removing pods with the `karpenter.sh/do-not-disrupt` annotation. +For this reason, it is not recommended to use the `karpenter.sh/do-not-disrupt` annotation with `expireAfter` **if** you have not also configured `terminationGracePeriod`. {{% /alert %}} ### Node-Level Controls -You can block Karpenter from voluntarily choosing to disrupt certain nodes by setting the `karpenter.sh/do-not-disrupt: "true"` annotation on the node. This will prevent disruption actions on the node. +You can block Karpenter from voluntarily choosing to disrupt certain nodes by setting the `karpenter.sh/do-not-disrupt: "true"` annotation on the node. +This will prevent voluntary disruption actions against the node. ```yaml apiVersion: v1 diff --git a/website/content/en/v1.0/concepts/disruption.md b/website/content/en/v1.0/concepts/disruption.md index c7396db6f8a1..4a2fb2b0b00f 100644 --- a/website/content/en/v1.0/concepts/disruption.md +++ b/website/content/en/v1.0/concepts/disruption.md @@ -13,9 +13,9 @@ The finalizer blocks deletion of the node object while the Termination Controlle ### Disruption Controller -Karpenter automatically discovers disruptable nodes and spins up replacements when needed. Karpenter disrupts nodes by executing one [automated method](#automated-methods) at a time, first doing Drift then Consolidation. Each method varies slightly, but they all follow the standard disruption process. Karpenter uses [disruption budgets]({{}}) to control the speed at which these disruptions begin. +Karpenter automatically discovers disruptable nodes and spins up replacements when needed. Karpenter disrupts nodes by executing one [automated method](#automated-graceful-methods) at a time, first doing Drift then Consolidation. Each method varies slightly, but they all follow the standard disruption process. Karpenter uses [disruption budgets]({{}}) to control the speed at which these disruptions begin. 1. Identify a list of prioritized candidates for the disruption method. - * If there are [pods that cannot be evicted](#pod-eviction) on the node, Karpenter will ignore the node and try disrupting it later. + * If there are [pods that cannot be evicted](#pod-level-controls) on the node, Karpenter will ignore the node and try disrupting it later. * If there are no disruptable nodes, continue to the next disruption method. 2. For each disruptable node: 1. Check if disrupting it would violate its NodePool's disruption budget. @@ -70,18 +70,14 @@ Automated graceful methods, can be rate limited through [NodePool Disruption Bud * Nodes can be removed as their workloads will run on other nodes in the cluster. * Nodes can be replaced with lower priced variants due to a change in the workloads. * [**Drift**]({{}}): Karpenter will mark nodes as drifted and disrupt nodes that have drifted from their desired specification. See [Drift]({{}}) to see which fields are considered. -* [**Interruption**]({{}}): Karpenter will watch for upcoming interruption events that could affect your nodes (health events, spot interruption, etc.) and will taint, drain, and terminate the node(s) ahead of the event to reduce workload disruption. {{% alert title="Defaults" color="secondary" %}} -Disruption is configured through the NodePool's disruption block by the `consolidationPolicy`, and `consolidateAfter` fields. `expireAfter` can also be used to control disruption. Karpenter will configure these fields with the following values by default if they are not set: +Disruption is configured through the NodePool's disruption block by the `consolidationPolicy`, and `consolidateAfter` fields. Karpenter will configure these fields with the following values by default if they are not set: ```yaml spec: disruption: consolidationPolicy: WhenEmptyOrUnderutilized - template: - spec: - expireAfter: 720h ``` {{% /alert %}} @@ -170,10 +166,21 @@ Karpenter will add the `Drifted` status condition on NodeClaims if the NodeClaim ## Automated Forceful Methods -Automated forceful methods will begin draining nodes as soon as the condition is met. Note that these methods blow past NodePool Disruption Budgets, and do not wait for a pre-spin replacement node to be healthy for the pods to reschedule, unlike the graceful methods mentioned above. Use Pod Disruption Budgets and `do-not-disrupt` on your nodes to rate-limit the speed at which your applications are disrupted. +Automated forceful methods will begin draining nodes as soon as the condition is met. +Unlike the graceful methods mentioned above, these methods can not be rate-limited using [NodePool Disruption Budgets](#nodepool-disruption-budgets), and do not wait for a pre-spin replacement node to be healthy for the pods to reschedule. +Pod disruption budgets may be used to rate-limit application disruption. ### Expiration -Karpenter will disrupt nodes as soon as they're expired after they've lived for the duration of the NodePool's `spec.template.spec.expireAfter`. You can use expiration to periodically recycle nodes due to security concern. +A node is expired once it's lifetime exceeds the duration set on the owning NodeClaim's `spec.expireAfter` field. +Changes to `spec.template.spec.expireAfter` on the owning NodePool will not update the field for existing NodeClaims - it will induce NodeClaim drift and the replacements will have the updated value. +Expiration can be used, in conjunction with [`terminationGracePeriod`](#termination-grace-period), to enforce a maximum Node lifetime. +By default, `expireAfter` is set to `720h` (30 days). + +{{% alert title="Warning" color="warning" %}} +Misconfigured PDBs and pods with the `karpenter.sh/do-not-disrupt` annotation may block draining indefinitely. +For this reason, it is not recommended to set `expireAfter` without also setting `terminationGracePeriod` **if** your cluster has pods with the `karpenter.sh/do-not-disrupt` annotation. +Doing so can result in partially drained nodes stuck in the cluster, driving up cluster cost and potentially requiring manual intervention to resolve. +{{% /alert %}} ### Interruption @@ -200,17 +207,39 @@ To enable interruption handling, configure the `--interruption-queue` CLI argume ## Controls -### TerminationGracePeriod +### TerminationGracePeriod -You can set a NodePool's `terminationGracePeriod` through the `spec.template.spec.terminationGracePeriod` field. This field defines the duration of time that a node can be draining before it's forcibly deleted. A node begins draining when it's deleted. Pods will be deleted preemptively based on its TerminationGracePeriodSeconds before this terminationGracePeriod ends to give as much time to cleanup as possible. Note that if your pod's terminationGracePeriodSeconds is larger than this terminationGracePeriod, Karpenter may forcibly delete the pod before it has its full terminationGracePeriod to cleanup. +To configure a maximum termination duration, `terminationGracePeriod` should be used. +It is configured through a NodePool's [`spec.template.spec.terminationGracePeriod`]({{}}) field, and is persisted to created NodeClaims (`spec.terminationGracePeriod`). +Changes to the [`spec.template.spec.terminationGracePeriod`]({{}}) field on the NodePool will not result in a change for existing NodeClaims - it will induce NodeClaim drift and the replacements will have the updated `terminationGracePeriod`. -This is especially useful in combination with `nodepool.spec.template.spec.expireAfter` to define an absolute maximum on the lifetime of a node, where a node is deleted at `expireAfter` and finishes draining within the `terminationGracePeriod` thereafter. Pods blocking eviction like PDBs and do-not-disrupt will block full draining until the `terminationGracePeriod` is reached. +Once a node is disrupted, via either a [graceful](#automated-graceful-methods) or [forceful](#automated-forceful-methods) disruption method, Karpenter will being draining the node. +At this point, the countdown for `terminationGracePeriod` begins. +Once the `terminationGracePeriod` elapses, remaining pods will be forcibly deleted and the unerlying instance will be terminated. +A node may be terminated before the `terminationGracePeriod` has elapsed if all disruptable pods have been drained. + +In conjunction with `expireAfter`, `terminationGracePeriod` can be used to enforce an absolute maximum node lifetime. +The node will begin to drain once its `expireAfter` has elapsed, and it will be forcibly terminated once its `terminationGracePeriod` has elapsed, making the maximum node lifetime the sum of the two fields. + +Additionally, configuring `terminationGracePeriod` changes the eligibility criteria for disruption via `Drift`. +When configured, a node may be disrupted via drift even if there are pods with blocking PDBs or the `karpenter.sh/do-not-disrupt` annotation scheduled to it. +This enables cluster administrators to ensure crucial updates (e.g. AMI updates addressing CVEs) can't be blocked by misconfigured applications. + +{{% alert title="Warning" color="warning" %}} +To ensure that the `terminationGracePeriodSeconds` value for draining pods is respected, pods will be preemptively deleted before the Node's `terminationGracePeriod` has elapsed. +This includes pods with blocking [pod disruption budgets](https://kubernetes.io/docs/tasks/run-application/configure-pdb/) or the [`karpenter.sh/do-not-disrupt` annotation]({{}}). -For instance, a NodeClaim with `terminationGracePeriod` set to `1h` and an `expireAfter` set to `23h` will begin draining after it's lived for `23h`. Let's say a `do-not-disrupt` pod has `TerminationGracePeriodSeconds` set to `300` seconds. If the node hasn't been fully drained after `55m`, Karpenter will delete the pod to allow it's full `terminationGracePeriodSeconds` to cleanup. If no pods are blocking draining, Karpenter will cleanup the node as soon as the node is fully drained, rather than waiting for the NodeClaim's `terminationGracePeriod` to finish. +Consider the following example: a Node with a 1 hour `terminationGracePeriod` has been disrupted and begins to drain. +A pod with the `karpenter.sh/do-not-disrupt` annotation and a 300 second (5 minute) `terminationGracePeriodsSeconds` is scheduled to it. + +If the pod is still running 55 minutes after the Node begins to drain, the pod will be deleted to ensure its `terminationGracePeriodSeconds` value is respected. +If a pod's `terminationGracePeriodSeconds` value exceeds that of the Node it is scheduled to, Karpenter will prioritize the Node's `terminationGracePeriod`. +The pod will be deleted as soon as the Node begins to drain, and it will not receive it's full `terminationGracePeriodSeconds`. +{{% /alert %}} ### NodePool Disruption Budgets -You can rate limit Karpenter's disruption through the NodePool's `spec.disruption.budgets`. If undefined, Karpenter will default to one budget with `nodes: 10%`. Budgets will consider nodes that are actively being deleted for any reason, and will only block Karpenter from disrupting nodes voluntarily through drift, emptiness, and consolidation. Note that NodePool Disruption Budgets do not prevent Karpenter from cleaning up expired or drifted nodes. +You can rate limit Karpenter's disruption through the NodePool's `spec.disruption.budgets`. If undefined, Karpenter will default to one budget with `nodes: 10%`. Budgets will consider nodes that are actively being deleted for any reason, and will only block Karpenter from disrupting nodes voluntarily through drift, emptiness, and consolidation. Note that NodePool Disruption Budgets do not prevent Karpenter from terminating expired nodes. #### Reasons Karpenter allows specifying if a budget applies to any of `Drifted`, `Underutilized`, or `Empty`. When a budget has no reasons, it's assumed that it applies to all reasons. When calculating allowed disruptions for a given reason, Karpenter will take the minimum of the budgets that have listed the reason or have left reasons undefined. @@ -223,7 +252,7 @@ If the budget is configured with a percentage value, such as `20%`, Karpenter wi For example, the following NodePool with three budgets defines the following requirements: - The first budget will only allow 20% of nodes owned by that NodePool to be disrupted if it's empty or drifted. For instance, if there were 19 nodes owned by the NodePool, 4 empty or drifted nodes could be disrupted, rounding up from `19 * .2 = 3.8`. - The second budget acts as a ceiling to the previous budget, only allowing 5 disruptions when there are more than 25 nodes. -- The last budget only blocks disruptions during the first 10 minutes of the day, where 0 disruptions are allowed, only applying to underutilized nodes. +- The last budget only blocks disruptions during the first 10 minutes of the day, where 0 disruptions are allowed, only applying to underutilized nodes. ```yaml apiVersion: karpenter.sh/v1 @@ -232,20 +261,20 @@ metadata: name: default spec: template: - spec: + spec: expireAfter: 720h # 30 * 24h = 720h disruption: consolidationPolicy: WhenEmptyOrUnderutilized budgets: - nodes: "20%" - reasons: + reasons: - "Empty" - "Drifted" - nodes: "5" - nodes: "0" schedule: "@daily" duration: 10m - reasons: + reasons: - "Underutilized" ``` @@ -274,8 +303,18 @@ Duration and Schedule must be defined together. When omitted, the budget is alwa ### Pod-Level Controls -You can block Karpenter from voluntarily choosing to disrupt certain pods by setting the `karpenter.sh/do-not-disrupt: "true"` annotation on the pod. This is useful for pods that you want to run from start to finish without disruption. By opting pods out of this disruption, you are telling Karpenter that it should not voluntarily remove a node containing this pod. - +You can block Karpenter from voluntarily disrupting and draining pods by adding the `karpenter.sh/do-not-disrupt: "true"` annotation to the pod. +You can treat this annotation as a single-pod, permanently blocking PDB. +This has the following consequences: +- Nodes with `karpenter.sh/do-not-disrupt` pods will be excluded from [Consolidation]({{}}), and conditionally excluded from [Drift]({{}}). + - If the Node's owning NodeClaim has a [`terminationGracePeriod`]({{}}) configured, it will still be eligible for disruption via drift. +- Like pods with a blocking PDB, pods with the `karpenter.sh/do-not-disrupt` annotation will **not** be gracefully evicted by the [Termination Controller]({{ref "#terminationcontroller"}}). + Karpenter will not be able to complete termination of the node until one of the following conditions is met: + - All pods with the `karpenter.sh/do-not-disrupt` annotation are removed. + - All pods with the `karpenter.sh/do-not-disrupt` annotation have entered a [terminal phase](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase) (`Succeeded` or `Failed`). + - The owning NodeClaim's [`terminationGracePeriod`]({{}}) has elapsed. + +This is useful for pods that you want to run from start to finish without disruption. Examples of pods that you might want to opt-out of disruption include an interactive game that you don't want to interrupt or a long batch job (such as you might have with machine learning) that would need to start over if it were interrupted. ```yaml @@ -289,15 +328,10 @@ spec: ``` {{% alert title="Note" color="primary" %}} -This annotation will be ignored for [terminating pods](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase) and [terminal pods](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase) (Failed/Succeeded). -{{% /alert %}} - -Examples of voluntary node removal that will be prevented by this annotation include: -- [Consolidation]({{}}) -- [Drift]({{}}) - -{{% alert title="Note" color="primary" %}} -Voluntary node removal does not include [Interruption]({{}}) or manual deletion initiated through `kubectl delete node`. Both of these are considered involuntary events, since node removal cannot be delayed. +The `karpenter.sh/do-not-disrupt` annotation does **not** exclude nodes from the forceful disruption methods: [Expiration]({{}}), [Interruption]({{}}), [Node Repair](), and manual deletion (e.g. `kubectl delete node ...`). +While both interruption and node repair have implicit upper-bounds on termination time, expiration and manual termination do not. +Manual intervention may be required to unblock node termination, by removing pods with the `karpenter.sh/do-not-disrupt` annotation. +For this reason, it is not recommended to use the `karpenter.sh/do-not-disrupt` annotation with `expireAfter` **if** you have not also configured `terminationGracePeriod`. {{% /alert %}} ### Node-Level Controls diff --git a/website/content/en/v1.1/concepts/disruption.md b/website/content/en/v1.1/concepts/disruption.md index 6077be7ba9e8..4a2fb2b0b00f 100644 --- a/website/content/en/v1.1/concepts/disruption.md +++ b/website/content/en/v1.1/concepts/disruption.md @@ -70,18 +70,14 @@ Automated graceful methods, can be rate limited through [NodePool Disruption Bud * Nodes can be removed as their workloads will run on other nodes in the cluster. * Nodes can be replaced with lower priced variants due to a change in the workloads. * [**Drift**]({{}}): Karpenter will mark nodes as drifted and disrupt nodes that have drifted from their desired specification. See [Drift]({{}}) to see which fields are considered. -* [**Interruption**]({{}}): Karpenter will watch for upcoming interruption events that could affect your nodes (health events, spot interruption, etc.) and will taint, drain, and terminate the node(s) ahead of the event to reduce workload disruption. {{% alert title="Defaults" color="secondary" %}} -Disruption is configured through the NodePool's disruption block by the `consolidationPolicy`, and `consolidateAfter` fields. `expireAfter` can also be used to control disruption. Karpenter will configure these fields with the following values by default if they are not set: +Disruption is configured through the NodePool's disruption block by the `consolidationPolicy`, and `consolidateAfter` fields. Karpenter will configure these fields with the following values by default if they are not set: ```yaml spec: disruption: consolidationPolicy: WhenEmptyOrUnderutilized - template: - spec: - expireAfter: 720h ``` {{% /alert %}} @@ -170,10 +166,21 @@ Karpenter will add the `Drifted` status condition on NodeClaims if the NodeClaim ## Automated Forceful Methods -Automated forceful methods will begin draining nodes as soon as the condition is met. Note that these methods blow past NodePool Disruption Budgets, and do not wait for a pre-spin replacement node to be healthy for the pods to reschedule, unlike the graceful methods mentioned above. Use Pod Disruption Budgets and `do-not-disrupt` on your nodes to rate-limit the speed at which your applications are disrupted. +Automated forceful methods will begin draining nodes as soon as the condition is met. +Unlike the graceful methods mentioned above, these methods can not be rate-limited using [NodePool Disruption Budgets](#nodepool-disruption-budgets), and do not wait for a pre-spin replacement node to be healthy for the pods to reschedule. +Pod disruption budgets may be used to rate-limit application disruption. ### Expiration -Karpenter will disrupt nodes as soon as they're expired after they've lived for the duration of the NodePool's `spec.template.spec.expireAfter`. You can use expiration to periodically recycle nodes due to security concern. +A node is expired once it's lifetime exceeds the duration set on the owning NodeClaim's `spec.expireAfter` field. +Changes to `spec.template.spec.expireAfter` on the owning NodePool will not update the field for existing NodeClaims - it will induce NodeClaim drift and the replacements will have the updated value. +Expiration can be used, in conjunction with [`terminationGracePeriod`](#termination-grace-period), to enforce a maximum Node lifetime. +By default, `expireAfter` is set to `720h` (30 days). + +{{% alert title="Warning" color="warning" %}} +Misconfigured PDBs and pods with the `karpenter.sh/do-not-disrupt` annotation may block draining indefinitely. +For this reason, it is not recommended to set `expireAfter` without also setting `terminationGracePeriod` **if** your cluster has pods with the `karpenter.sh/do-not-disrupt` annotation. +Doing so can result in partially drained nodes stuck in the cluster, driving up cluster cost and potentially requiring manual intervention to resolve. +{{% /alert %}} ### Interruption @@ -200,17 +207,39 @@ To enable interruption handling, configure the `--interruption-queue` CLI argume ## Controls -### TerminationGracePeriod +### TerminationGracePeriod -You can set a NodePool's `terminationGracePeriod` through the `spec.template.spec.terminationGracePeriod` field. This field defines the duration of time that a node can be draining before it's forcibly deleted. A node begins draining when it's deleted. Pods will be deleted preemptively based on its TerminationGracePeriodSeconds before this terminationGracePeriod ends to give as much time to cleanup as possible. Note that if your pod's terminationGracePeriodSeconds is larger than this terminationGracePeriod, Karpenter may forcibly delete the pod before it has its full terminationGracePeriod to cleanup. +To configure a maximum termination duration, `terminationGracePeriod` should be used. +It is configured through a NodePool's [`spec.template.spec.terminationGracePeriod`]({{}}) field, and is persisted to created NodeClaims (`spec.terminationGracePeriod`). +Changes to the [`spec.template.spec.terminationGracePeriod`]({{}}) field on the NodePool will not result in a change for existing NodeClaims - it will induce NodeClaim drift and the replacements will have the updated `terminationGracePeriod`. -This is especially useful in combination with `nodepool.spec.template.spec.expireAfter` to define an absolute maximum on the lifetime of a node, where a node is deleted at `expireAfter` and finishes draining within the `terminationGracePeriod` thereafter. Pods blocking eviction like PDBs and do-not-disrupt will block full draining until the `terminationGracePeriod` is reached. +Once a node is disrupted, via either a [graceful](#automated-graceful-methods) or [forceful](#automated-forceful-methods) disruption method, Karpenter will being draining the node. +At this point, the countdown for `terminationGracePeriod` begins. +Once the `terminationGracePeriod` elapses, remaining pods will be forcibly deleted and the unerlying instance will be terminated. +A node may be terminated before the `terminationGracePeriod` has elapsed if all disruptable pods have been drained. + +In conjunction with `expireAfter`, `terminationGracePeriod` can be used to enforce an absolute maximum node lifetime. +The node will begin to drain once its `expireAfter` has elapsed, and it will be forcibly terminated once its `terminationGracePeriod` has elapsed, making the maximum node lifetime the sum of the two fields. + +Additionally, configuring `terminationGracePeriod` changes the eligibility criteria for disruption via `Drift`. +When configured, a node may be disrupted via drift even if there are pods with blocking PDBs or the `karpenter.sh/do-not-disrupt` annotation scheduled to it. +This enables cluster administrators to ensure crucial updates (e.g. AMI updates addressing CVEs) can't be blocked by misconfigured applications. + +{{% alert title="Warning" color="warning" %}} +To ensure that the `terminationGracePeriodSeconds` value for draining pods is respected, pods will be preemptively deleted before the Node's `terminationGracePeriod` has elapsed. +This includes pods with blocking [pod disruption budgets](https://kubernetes.io/docs/tasks/run-application/configure-pdb/) or the [`karpenter.sh/do-not-disrupt` annotation]({{}}). -For instance, a NodeClaim with `terminationGracePeriod` set to `1h` and an `expireAfter` set to `23h` will begin draining after it's lived for `23h`. Let's say a `do-not-disrupt` pod has `TerminationGracePeriodSeconds` set to `300` seconds. If the node hasn't been fully drained after `55m`, Karpenter will delete the pod to allow it's full `terminationGracePeriodSeconds` to cleanup. If no pods are blocking draining, Karpenter will cleanup the node as soon as the node is fully drained, rather than waiting for the NodeClaim's `terminationGracePeriod` to finish. +Consider the following example: a Node with a 1 hour `terminationGracePeriod` has been disrupted and begins to drain. +A pod with the `karpenter.sh/do-not-disrupt` annotation and a 300 second (5 minute) `terminationGracePeriodsSeconds` is scheduled to it. + +If the pod is still running 55 minutes after the Node begins to drain, the pod will be deleted to ensure its `terminationGracePeriodSeconds` value is respected. +If a pod's `terminationGracePeriodSeconds` value exceeds that of the Node it is scheduled to, Karpenter will prioritize the Node's `terminationGracePeriod`. +The pod will be deleted as soon as the Node begins to drain, and it will not receive it's full `terminationGracePeriodSeconds`. +{{% /alert %}} ### NodePool Disruption Budgets -You can rate limit Karpenter's disruption through the NodePool's `spec.disruption.budgets`. If undefined, Karpenter will default to one budget with `nodes: 10%`. Budgets will consider nodes that are actively being deleted for any reason, and will only block Karpenter from disrupting nodes voluntarily through drift, emptiness, and consolidation. Note that NodePool Disruption Budgets do not prevent Karpenter from terminating expired nodes. +You can rate limit Karpenter's disruption through the NodePool's `spec.disruption.budgets`. If undefined, Karpenter will default to one budget with `nodes: 10%`. Budgets will consider nodes that are actively being deleted for any reason, and will only block Karpenter from disrupting nodes voluntarily through drift, emptiness, and consolidation. Note that NodePool Disruption Budgets do not prevent Karpenter from terminating expired nodes. #### Reasons Karpenter allows specifying if a budget applies to any of `Drifted`, `Underutilized`, or `Empty`. When a budget has no reasons, it's assumed that it applies to all reasons. When calculating allowed disruptions for a given reason, Karpenter will take the minimum of the budgets that have listed the reason or have left reasons undefined. @@ -223,7 +252,7 @@ If the budget is configured with a percentage value, such as `20%`, Karpenter wi For example, the following NodePool with three budgets defines the following requirements: - The first budget will only allow 20% of nodes owned by that NodePool to be disrupted if it's empty or drifted. For instance, if there were 19 nodes owned by the NodePool, 4 empty or drifted nodes could be disrupted, rounding up from `19 * .2 = 3.8`. - The second budget acts as a ceiling to the previous budget, only allowing 5 disruptions when there are more than 25 nodes. -- The last budget only blocks disruptions during the first 10 minutes of the day, where 0 disruptions are allowed, only applying to underutilized nodes. +- The last budget only blocks disruptions during the first 10 minutes of the day, where 0 disruptions are allowed, only applying to underutilized nodes. ```yaml apiVersion: karpenter.sh/v1 @@ -232,20 +261,20 @@ metadata: name: default spec: template: - spec: + spec: expireAfter: 720h # 30 * 24h = 720h disruption: consolidationPolicy: WhenEmptyOrUnderutilized budgets: - nodes: "20%" - reasons: + reasons: - "Empty" - "Drifted" - nodes: "5" - nodes: "0" schedule: "@daily" duration: 10m - reasons: + reasons: - "Underutilized" ``` @@ -274,8 +303,18 @@ Duration and Schedule must be defined together. When omitted, the budget is alwa ### Pod-Level Controls -You can block Karpenter from voluntarily choosing to disrupt certain pods by setting the `karpenter.sh/do-not-disrupt: "true"` annotation on the pod. This is useful for pods that you want to run from start to finish without disruption. By opting pods out of this disruption, you are telling Karpenter that it should not voluntarily remove a node containing this pod. - +You can block Karpenter from voluntarily disrupting and draining pods by adding the `karpenter.sh/do-not-disrupt: "true"` annotation to the pod. +You can treat this annotation as a single-pod, permanently blocking PDB. +This has the following consequences: +- Nodes with `karpenter.sh/do-not-disrupt` pods will be excluded from [Consolidation]({{}}), and conditionally excluded from [Drift]({{}}). + - If the Node's owning NodeClaim has a [`terminationGracePeriod`]({{}}) configured, it will still be eligible for disruption via drift. +- Like pods with a blocking PDB, pods with the `karpenter.sh/do-not-disrupt` annotation will **not** be gracefully evicted by the [Termination Controller]({{ref "#terminationcontroller"}}). + Karpenter will not be able to complete termination of the node until one of the following conditions is met: + - All pods with the `karpenter.sh/do-not-disrupt` annotation are removed. + - All pods with the `karpenter.sh/do-not-disrupt` annotation have entered a [terminal phase](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase) (`Succeeded` or `Failed`). + - The owning NodeClaim's [`terminationGracePeriod`]({{}}) has elapsed. + +This is useful for pods that you want to run from start to finish without disruption. Examples of pods that you might want to opt-out of disruption include an interactive game that you don't want to interrupt or a long batch job (such as you might have with machine learning) that would need to start over if it were interrupted. ```yaml @@ -289,15 +328,10 @@ spec: ``` {{% alert title="Note" color="primary" %}} -This annotation will be ignored for [terminating pods](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase) and [terminal pods](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase) (Failed/Succeeded). -{{% /alert %}} - -Examples of voluntary node removal that will be prevented by this annotation include: -- [Consolidation]({{}}) -- [Drift]({{}}) - -{{% alert title="Note" color="primary" %}} -Voluntary node removal does not include [Interruption]({{}}) or manual deletion initiated through `kubectl delete node`. Both of these are considered involuntary events, since node removal cannot be delayed. +The `karpenter.sh/do-not-disrupt` annotation does **not** exclude nodes from the forceful disruption methods: [Expiration]({{}}), [Interruption]({{}}), [Node Repair](), and manual deletion (e.g. `kubectl delete node ...`). +While both interruption and node repair have implicit upper-bounds on termination time, expiration and manual termination do not. +Manual intervention may be required to unblock node termination, by removing pods with the `karpenter.sh/do-not-disrupt` annotation. +For this reason, it is not recommended to use the `karpenter.sh/do-not-disrupt` annotation with `expireAfter` **if** you have not also configured `terminationGracePeriod`. {{% /alert %}} ### Node-Level Controls diff --git a/website/content/en/v1.2/concepts/disruption.md b/website/content/en/v1.2/concepts/disruption.md index df281154c9b9..85e173bec923 100644 --- a/website/content/en/v1.2/concepts/disruption.md +++ b/website/content/en/v1.2/concepts/disruption.md @@ -70,18 +70,14 @@ Automated graceful methods, can be rate limited through [NodePool Disruption Bud * Nodes can be removed as their workloads will run on other nodes in the cluster. * Nodes can be replaced with lower priced variants due to a change in the workloads. * [**Drift**]({{}}): Karpenter will mark nodes as drifted and disrupt nodes that have drifted from their desired specification. See [Drift]({{}}) to see which fields are considered. -* [**Interruption**]({{}}): Karpenter will watch for upcoming interruption events that could affect your nodes (health events, spot interruption, etc.) and will taint, drain, and terminate the node(s) ahead of the event to reduce workload disruption. {{% alert title="Defaults" color="secondary" %}} -Disruption is configured through the NodePool's disruption block by the `consolidationPolicy`, and `consolidateAfter` fields. `expireAfter` can also be used to control disruption. Karpenter will configure these fields with the following values by default if they are not set: +Disruption is configured through the NodePool's disruption block by the `consolidationPolicy`, and `consolidateAfter` fields. Karpenter will configure these fields with the following values by default if they are not set: ```yaml spec: disruption: consolidationPolicy: WhenEmptyOrUnderutilized - template: - spec: - expireAfter: 720h ``` {{% /alert %}} @@ -169,10 +165,22 @@ Karpenter will add the `Drifted` status condition on NodeClaims if the NodeClaim ## Automated Forceful Methods -Automated forceful methods will begin draining nodes as soon as the condition is met. Note that these methods blow past NodePool Disruption Budgets, and do not wait for a pre-spin replacement node to be healthy for the pods to reschedule, unlike the graceful methods mentioned above. Use Pod Disruption Budgets and `do-not-disrupt` on your nodes to rate-limit the speed at which your applications are disrupted. +Automated forceful methods will begin draining nodes as soon as the condition is met. +Unlike the graceful methods mentioned above, these methods can not be rate-limited using [NodePool Disruption Budgets](#nodepool-disruption-budgets), and do not wait for a pre-spin replacement node to be healthy for the pods to reschedule. +Pod disruption budgets may be used to rate-limit application disruption. ### Expiration -Karpenter will disrupt nodes as soon as they're expired after they've lived for the duration of the NodePool's `spec.template.spec.expireAfter`. You can use expiration to periodically recycle nodes due to security concern. + +A node is expired once it's lifetime exceeds the duration set on the owning NodeClaim's `spec.expireAfter` field. +Changes to `spec.template.spec.expireAfter` on the owning NodePool will not update the field for existing NodeClaims - it will induce NodeClaim drift and the replacements will have the updated value. +Expiration can be used, in conjunction with [`terminationGracePeriod`](#termination-grace-period), to enforce a maximum Node lifetime. +By default, `expireAfter` is set to `720h` (30 days). + +{{% alert title="Warning" color="warning" %}} +Misconfigured PDBs and pods with the `karpenter.sh/do-not-disrupt` annotation may block draining indefinitely. +For this reason, it is not recommended to set `expireAfter` without also setting `terminationGracePeriod` **if** your cluster has pods with the `karpenter.sh/do-not-disrupt` annotation. +Doing so can result in partially drained nodes stuck in the cluster, driving up cluster cost and potentially requiring manual intervention to resolve. +{{% /alert %}} ### Interruption @@ -197,13 +205,13 @@ Karpenter enables this feature by watching an SQS queue which receives critical To enable interruption handling, configure the `--interruption-queue` CLI argument with the name of the interruption queue provisioned to handle interruption events. -### Node Auto Repair +### Node Auto Repair Feature State: Karpenter v1.1.0 [alpha]({{}}) Node Auto Repair is a feature that automatically identifies and replaces unhealthy nodes in your cluster, helping to maintain overall cluster health. Nodes can experience various types of failures affecting their hardware, file systems, or container environments. These failures may be surfaced through node conditions such as network unavailability, disk pressure, memory pressure, or other conditions reported by node diagnostic agents. When Karpenter detects these unhealthy conditions, it automatically replaces the affected nodes based on cloud provider-defined repair policies. Once a node has been in an unhealthy state beyond its configured toleration duration, Karpenter will forcefully terminate the node and its corresponding NodeClaim, bypassing the standard drain and grace period procedures to ensure swift replacement of problematic nodes. To prevent cascading failures, Karpenter includes safety mechanisms: it will not perform repairs if more than 20% of nodes in a NodePool are unhealthy, and for standalone NodeClaims, it evaluates this threshold against all nodes in the cluster. This ensures your cluster remains in a healthy state with minimal manual intervention, even in scenarios where normal node termination procedures might be impacted by the node's unhealthy state. -To enable Node Auto Repair: +To enable Node Auto Repair: 1. Ensure you have a [Node Monitoring Agent](https://docs.aws.amazon.com/en_us/eks/latest/userguide/node-health.html) deployed or any agent that will add status conditions to nodes that are supported (e.g., Node Problem Detector) 2. Enable the feature flag: `NodeRepair=true` 3. Node AutoRepair will automatically terminate nodes when they have unhealthy status conditions based on your cloud provider's repair policies @@ -214,36 +222,58 @@ Karpenter monitors nodes for the following node status conditions when initiatin #### Kubelet Node Conditions -| Type | Status | Toleration Duration | +| Type | Status | Toleration Duration | | ------ | ------------- | ------------------- | | Ready | False | 30 minutes | -| Ready | Unknown | 30 minutes | +| Ready | Unknown | 30 minutes | #### Node Monitoring Agent Conditions -| Type | Status | Toleration Duration | +| Type | Status | Toleration Duration | | ------------------------ | ------------| --------------------- | | AcceleratedHardwareReady | False | 10 minutes | -| StorageReady | False | 30 minutes | -| NetworkingReady | False | 30 minutes | -| KernelReady | False | 30 minutes | -| ContainerRuntimeReady | False | 30 minutes | +| StorageReady | False | 30 minutes | +| NetworkingReady | False | 30 minutes | +| KernelReady | False | 30 minutes | +| ContainerRuntimeReady | False | 30 minutes | To enable the drift feature flag, refer to the [Feature Gates]({{}}). ## Controls -### TerminationGracePeriod +### TerminationGracePeriod -You can set a NodePool's `terminationGracePeriod` through the `spec.template.spec.terminationGracePeriod` field. This field defines the duration of time that a node can be draining before it's forcibly deleted. A node begins draining when it's deleted. Pods will be deleted preemptively based on its TerminationGracePeriodSeconds before this terminationGracePeriod ends to give as much time to cleanup as possible. Note that if your pod's terminationGracePeriodSeconds is larger than this terminationGracePeriod, Karpenter may forcibly delete the pod before it has its full terminationGracePeriod to cleanup. +To configure a maximum termination duration, `terminationGracePeriod` should be used. +It is configured through a NodePool's [`spec.template.spec.terminationGracePeriod`]({{}}) field, and is persisted to created NodeClaims (`spec.terminationGracePeriod`). +Changes to the [`spec.template.spec.terminationGracePeriod`]({{}}) field on the NodePool will not result in a change for existing NodeClaims - it will induce NodeClaim drift and the replacements will have the updated `terminationGracePeriod`. -This is especially useful in combination with `nodepool.spec.template.spec.expireAfter` to define an absolute maximum on the lifetime of a node, where a node is deleted at `expireAfter` and finishes draining within the `terminationGracePeriod` thereafter. Pods blocking eviction like PDBs and do-not-disrupt will block full draining until the `terminationGracePeriod` is reached. +Once a node is disrupted, via either a [graceful](#automated-graceful-methods) or [forceful](#automated-forceful-methods) disruption method, Karpenter will being draining the node. +At this point, the countdown for `terminationGracePeriod` begins. +Once the `terminationGracePeriod` elapses, remaining pods will be forcibly deleted and the unerlying instance will be terminated. +A node may be terminated before the `terminationGracePeriod` has elapsed if all disruptable pods have been drained. + +In conjunction with `expireAfter`, `terminationGracePeriod` can be used to enforce an absolute maximum node lifetime. +The node will begin to drain once its `expireAfter` has elapsed, and it will be forcibly terminated once its `terminationGracePeriod` has elapsed, making the maximum node lifetime the sum of the two fields. + +Additionally, configuring `terminationGracePeriod` changes the eligibility criteria for disruption via `Drift`. +When configured, a node may be disrupted via drift even if there are pods with blocking PDBs or the `karpenter.sh/do-not-disrupt` annotation scheduled to it. +This enables cluster administrators to ensure crucial updates (e.g. AMI updates addressing CVEs) can't be blocked by misconfigured applications. + +{{% alert title="Warning" color="warning" %}} +To ensure that the `terminationGracePeriodSeconds` value for draining pods is respected, pods will be preemptively deleted before the Node's `terminationGracePeriod` has elapsed. +This includes pods with blocking [pod disruption budgets](https://kubernetes.io/docs/tasks/run-application/configure-pdb/) or the [`karpenter.sh/do-not-disrupt` annotation]({{}}). -For instance, a NodeClaim with `terminationGracePeriod` set to `1h` and an `expireAfter` set to `23h` will begin draining after it's lived for `23h`. Let's say a `do-not-disrupt` pod has `TerminationGracePeriodSeconds` set to `300` seconds. If the node hasn't been fully drained after `55m`, Karpenter will delete the pod to allow it's full `terminationGracePeriodSeconds` to cleanup. If no pods are blocking draining, Karpenter will cleanup the node as soon as the node is fully drained, rather than waiting for the NodeClaim's `terminationGracePeriod` to finish. +Consider the following example: a Node with a 1 hour `terminationGracePeriod` has been disrupted and begins to drain. +A pod with the `karpenter.sh/do-not-disrupt` annotation and a 300 second (5 minute) `terminationGracePeriodsSeconds` is scheduled to it. +If the pod is still running 55 minutes after the Node begins to drain, the pod will be deleted to ensure its `terminationGracePeriodSeconds` value is respected. + +If a pod's `terminationGracePeriodSeconds` value exceeds that of the Node it is scheduled to, Karpenter will prioritize the Node's `terminationGracePeriod`. +The pod will be deleted as soon as the Node begins to drain, and it will not receive it's full `terminationGracePeriodSeconds`. +{{% /alert %}} ### NodePool Disruption Budgets -You can rate limit Karpenter's disruption through the NodePool's `spec.disruption.budgets`. If undefined, Karpenter will default to one budget with `nodes: 10%`. Budgets will consider nodes that are actively being deleted for any reason, and will only block Karpenter from disrupting nodes voluntarily through drift, emptiness, and consolidation. Note that NodePool Disruption Budgets do not prevent Karpenter from terminating expired nodes. +You can rate limit Karpenter's disruption through the NodePool's `spec.disruption.budgets`. If undefined, Karpenter will default to one budget with `nodes: 10%`. Budgets will consider nodes that are actively being deleted for any reason, and will only block Karpenter from disrupting nodes voluntarily through drift, emptiness, and consolidation. Note that NodePool Disruption Budgets do not prevent Karpenter from terminating expired nodes. #### Reasons Karpenter allows specifying if a budget applies to any of `Drifted`, `Underutilized`, or `Empty`. When a budget has no reasons, it's assumed that it applies to all reasons. When calculating allowed disruptions for a given reason, Karpenter will take the minimum of the budgets that have listed the reason or have left reasons undefined. @@ -256,7 +286,7 @@ If the budget is configured with a percentage value, such as `20%`, Karpenter wi For example, the following NodePool with three budgets defines the following requirements: - The first budget will only allow 20% of nodes owned by that NodePool to be disrupted if it's empty or drifted. For instance, if there were 19 nodes owned by the NodePool, 4 empty or drifted nodes could be disrupted, rounding up from `19 * .2 = 3.8`. - The second budget acts as a ceiling to the previous budget, only allowing 5 disruptions when there are more than 25 nodes. -- The last budget only blocks disruptions during the first 10 minutes of the day, where 0 disruptions are allowed, only applying to underutilized nodes. +- The last budget only blocks disruptions during the first 10 minutes of the day, where 0 disruptions are allowed, only applying to underutilized nodes. ```yaml apiVersion: karpenter.sh/v1 @@ -264,21 +294,18 @@ kind: NodePool metadata: name: default spec: - template: - spec: - expireAfter: 720h # 30 * 24h = 720h disruption: consolidationPolicy: WhenEmptyOrUnderutilized budgets: - nodes: "20%" - reasons: + reasons: - "Empty" - "Drifted" - nodes: "5" - nodes: "0" schedule: "@daily" duration: 10m - reasons: + reasons: - "Underutilized" ``` @@ -307,8 +334,18 @@ Duration and Schedule must be defined together. When omitted, the budget is alwa ### Pod-Level Controls -You can block Karpenter from voluntarily choosing to disrupt certain pods by setting the `karpenter.sh/do-not-disrupt: "true"` annotation on the pod. This is useful for pods that you want to run from start to finish without disruption. By opting pods out of this disruption, you are telling Karpenter that it should not voluntarily remove a node containing this pod. - +You can block Karpenter from voluntarily disrupting and draining pods by adding the `karpenter.sh/do-not-disrupt: "true"` annotation to the pod. +You can treat this annotation as a single-pod, permanently blocking PDB. +This has the following consequences: +- Nodes with `karpenter.sh/do-not-disrupt` pods will be excluded from [Consolidation]({{}}), and conditionally excluded from [Drift]({{}}). + - If the Node's owning NodeClaim has a [`terminationGracePeriod`]({{}}) configured, it will still be eligible for disruption via drift. +- Like pods with a blocking PDB, pods with the `karpenter.sh/do-not-disrupt` annotation will **not** be gracefully evicted by the [Termination Controller]({{ref "#terminationcontroller"}}). + Karpenter will not be able to complete termination of the node until one of the following conditions is met: + - All pods with the `karpenter.sh/do-not-disrupt` annotation are removed. + - All pods with the `karpenter.sh/do-not-disrupt` annotation have entered a [terminal phase](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase) (`Succeeded` or `Failed`). + - The owning NodeClaim's [`terminationGracePeriod`]({{}}) has elapsed. + +This is useful for pods that you want to run from start to finish without disruption. Examples of pods that you might want to opt-out of disruption include an interactive game that you don't want to interrupt or a long batch job (such as you might have with machine learning) that would need to start over if it were interrupted. ```yaml @@ -322,20 +359,16 @@ spec: ``` {{% alert title="Note" color="primary" %}} -This annotation will be ignored for [terminating pods](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase) and [terminal pods](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase) (Failed/Succeeded). -{{% /alert %}} - -Examples of voluntary node removal that will be prevented by this annotation include: -- [Consolidation]({{}}) -- [Drift]({{}}) - -{{% alert title="Note" color="primary" %}} -Voluntary node removal does not include [Interruption]({{}}) or manual deletion initiated through `kubectl delete node`. Both of these are considered involuntary events, since node removal cannot be delayed. +The `karpenter.sh/do-not-disrupt` annotation does **not** exclude nodes from the forceful disruption methods: [Expiration]({{}}), [Interruption]({{}}), [Node Repair](), and manual deletion (e.g. `kubectl delete node ...`). +While both interruption and node repair have implicit upper-bounds on termination time, expiration and manual termination do not. +Manual intervention may be required to unblock node termination, by removing pods with the `karpenter.sh/do-not-disrupt` annotation. +For this reason, it is not recommended to use the `karpenter.sh/do-not-disrupt` annotation with `expireAfter` **if** you have not also configured `terminationGracePeriod`. {{% /alert %}} ### Node-Level Controls -You can block Karpenter from voluntarily choosing to disrupt certain nodes by setting the `karpenter.sh/do-not-disrupt: "true"` annotation on the node. This will prevent disruption actions on the node. +You can block Karpenter from voluntarily choosing to disrupt certain nodes by setting the `karpenter.sh/do-not-disrupt: "true"` annotation on the node. +This will prevent voluntary disruption actions against the node. ```yaml apiVersion: v1 From b2f3afef40a3fd2115caf00b12d0be22e2423a03 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 10 Feb 2025 15:44:03 -0800 Subject: [PATCH 24/34] chore(deps): bump the go-deps group with 13 updates (#7717) Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- go.mod | 40 ++++++++++++++--------------- go.sum | 79 +++++++++++++++++++++++++++++----------------------------- 2 files changed, 60 insertions(+), 59 deletions(-) diff --git a/go.mod b/go.mod index 154f342052f0..d2f709787038 100644 --- a/go.mod +++ b/go.mod @@ -7,18 +7,18 @@ require ( github.com/PuerkitoBio/goquery v1.10.1 github.com/avast/retry-go v3.0.0+incompatible github.com/aws/amazon-vpc-resource-controller-k8s v1.6.3 - github.com/aws/aws-sdk-go-v2 v1.36.0 - github.com/aws/aws-sdk-go-v2/config v1.29.4 - github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.16.27 - github.com/aws/aws-sdk-go-v2/service/ec2 v1.202.2 - github.com/aws/aws-sdk-go-v2/service/eks v1.57.2 - github.com/aws/aws-sdk-go-v2/service/fis v1.31.9 - github.com/aws/aws-sdk-go-v2/service/iam v1.38.10 - github.com/aws/aws-sdk-go-v2/service/pricing v1.32.14 - github.com/aws/aws-sdk-go-v2/service/sqs v1.37.12 - github.com/aws/aws-sdk-go-v2/service/ssm v1.56.10 - github.com/aws/aws-sdk-go-v2/service/sts v1.33.12 - github.com/aws/aws-sdk-go-v2/service/timestreamwrite v1.29.15 + github.com/aws/aws-sdk-go-v2 v1.36.1 + github.com/aws/aws-sdk-go-v2/config v1.29.6 + github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.16.28 + github.com/aws/aws-sdk-go-v2/service/ec2 v1.202.4 + github.com/aws/aws-sdk-go-v2/service/eks v1.58.0 + github.com/aws/aws-sdk-go-v2/service/fis v1.31.10 + github.com/aws/aws-sdk-go-v2/service/iam v1.39.1 + github.com/aws/aws-sdk-go-v2/service/pricing v1.32.16 + github.com/aws/aws-sdk-go-v2/service/sqs v1.37.14 + github.com/aws/aws-sdk-go-v2/service/ssm v1.56.12 + github.com/aws/aws-sdk-go-v2/service/sts v1.33.14 + github.com/aws/aws-sdk-go-v2/service/timestreamwrite v1.29.16 github.com/aws/karpenter-provider-aws/tools/kompat v0.0.0-20240410220356-6b868db24881 github.com/aws/smithy-go v1.22.2 github.com/awslabs/amazon-eks-ami/nodeadm v0.0.0-20240229193347-cfab22a10647 @@ -35,7 +35,7 @@ require ( github.com/samber/lo v1.49.1 go.uber.org/multierr v1.11.0 go.uber.org/zap v1.27.0 - golang.org/x/sync v0.10.0 + golang.org/x/sync v0.11.0 k8s.io/api v0.32.1 k8s.io/apiextensions-apiserver v0.32.1 k8s.io/apimachinery v0.32.1 @@ -50,15 +50,15 @@ require ( require ( github.com/Masterminds/semver/v3 v3.2.1 // indirect github.com/andybalholm/cascadia v1.3.3 // indirect - github.com/aws/aws-sdk-go-v2/credentials v1.17.57 // indirect - github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.31 // indirect - github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.6.31 // indirect + github.com/aws/aws-sdk-go-v2/credentials v1.17.59 // indirect + github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.32 // indirect + github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.6.32 // indirect github.com/aws/aws-sdk-go-v2/internal/ini v1.8.2 // indirect github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.12.2 // indirect - github.com/aws/aws-sdk-go-v2/service/internal/endpoint-discovery v1.10.12 // indirect - github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.12.12 // indirect - github.com/aws/aws-sdk-go-v2/service/sso v1.24.14 // indirect - github.com/aws/aws-sdk-go-v2/service/ssooidc v1.28.13 // indirect + github.com/aws/aws-sdk-go-v2/service/internal/endpoint-discovery v1.10.13 // indirect + github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.12.13 // indirect + github.com/aws/aws-sdk-go-v2/service/sso v1.24.15 // indirect + github.com/aws/aws-sdk-go-v2/service/ssooidc v1.28.14 // indirect github.com/beorn7/perks v1.0.1 // indirect github.com/cespare/xxhash/v2 v2.3.0 // indirect github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect diff --git a/go.sum b/go.sum index 125ae461e52a..ec3a5e2a9f78 100644 --- a/go.sum +++ b/go.sum @@ -10,48 +10,48 @@ github.com/avast/retry-go v3.0.0+incompatible h1:4SOWQ7Qs+oroOTQOYnAHqelpCO0biHS github.com/avast/retry-go v3.0.0+incompatible/go.mod h1:XtSnn+n/sHqQIpZ10K1qAevBhOOCWBLXXy3hyiqqBrY= github.com/aws/amazon-vpc-resource-controller-k8s v1.6.3 h1:B4o15iZP8CQoyDjoNAoQiyEPabLsgxXLY5tv3uvvCic= github.com/aws/amazon-vpc-resource-controller-k8s v1.6.3/go.mod h1:k4zcf2Dz/Mvrgo8NVzAEWP5HK4USqbJTD93pVVDxvc0= -github.com/aws/aws-sdk-go-v2 v1.36.0 h1:b1wM5CcE65Ujwn565qcwgtOTT1aT4ADOHHgglKjG7fk= -github.com/aws/aws-sdk-go-v2 v1.36.0/go.mod h1:5PMILGVKiW32oDzjj6RU52yrNrDPUHcbZQYr1sM7qmM= -github.com/aws/aws-sdk-go-v2/config v1.29.4 h1:ObNqKsDYFGr2WxnoXKOhCvTlf3HhwtoGgc+KmZ4H5yg= -github.com/aws/aws-sdk-go-v2/config v1.29.4/go.mod h1:j2/AF7j/qxVmsNIChw1tWfsVKOayJoGRDjg1Tgq7NPk= -github.com/aws/aws-sdk-go-v2/credentials v1.17.57 h1:kFQDsbdBAR3GZsB8xA+51ptEnq9TIj3tS4MuP5b+TcQ= -github.com/aws/aws-sdk-go-v2/credentials v1.17.57/go.mod h1:2kerxPUUbTagAr/kkaHiqvj/bcYHzi2qiJS/ZinllU0= -github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.16.27 h1:7lOW8NUwE9UZekS1DYoiPdVAqZ6A+LheHWb+mHbNOq8= -github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.16.27/go.mod h1:w1BASFIPOPUae7AgaH4SbjNbfdkxuggLyGfNFTn8ITY= -github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.31 h1:lWm9ucLSRFiI4dQQafLrEOmEDGry3Swrz0BIRdiHJqQ= -github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.31/go.mod h1:Huu6GG0YTfbPphQkDSo4dEGmQRTKb9k9G7RdtyQWxuI= -github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.6.31 h1:ACxDklUKKXb48+eg5ROZXi1vDgfMyfIA/WyvqHcHI0o= -github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.6.31/go.mod h1:yadnfsDwqXeVaohbGc/RaD287PuyRw2wugkh5ZL2J6k= +github.com/aws/aws-sdk-go-v2 v1.36.1 h1:iTDl5U6oAhkNPba0e1t1hrwAo02ZMqbrGq4k5JBWM5E= +github.com/aws/aws-sdk-go-v2 v1.36.1/go.mod h1:5PMILGVKiW32oDzjj6RU52yrNrDPUHcbZQYr1sM7qmM= +github.com/aws/aws-sdk-go-v2/config v1.29.6 h1:fqgqEKK5HaZVWLQoLiC9Q+xDlSp+1LYidp6ybGE2OGg= +github.com/aws/aws-sdk-go-v2/config v1.29.6/go.mod h1:Ft+WLODzDQmCTHDvqAH1JfC2xxbZ0MxpZAcJqmE1LTQ= +github.com/aws/aws-sdk-go-v2/credentials v1.17.59 h1:9btwmrt//Q6JcSdgJOLI98sdr5p7tssS9yAsGe8aKP4= +github.com/aws/aws-sdk-go-v2/credentials v1.17.59/go.mod h1:NM8fM6ovI3zak23UISdWidyZuI1ghNe2xjzUZAyT+08= +github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.16.28 h1:KwsodFKVQTlI5EyhRSugALzsV6mG/SGrdjlMXSZSdso= +github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.16.28/go.mod h1:EY3APf9MzygVhKuPXAc5H+MkGb8k/DOSQjWS0LgkKqI= +github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.32 h1:BjUcr3X3K0wZPGFg2bxOWW3VPN8rkE3/61zhP+IHviA= +github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.32/go.mod h1:80+OGC/bgzzFFTUmcuwD0lb4YutwQeKLFpmt6hoWapU= +github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.6.32 h1:m1GeXHVMJsRsUAqG6HjZWx9dj7F5TR+cF1bjyfYyBd4= +github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.6.32/go.mod h1:IitoQxGfaKdVLNg0hD8/DXmAqNy0H4K2H2Sf91ti8sI= github.com/aws/aws-sdk-go-v2/internal/ini v1.8.2 h1:Pg9URiobXy85kgFev3og2CuOZ8JZUBENF+dcgWBaYNk= github.com/aws/aws-sdk-go-v2/internal/ini v1.8.2/go.mod h1:FbtygfRFze9usAadmnGJNc8KsP346kEe+y2/oyhGAGc= -github.com/aws/aws-sdk-go-v2/service/ec2 v1.202.2 h1:qas57zkkMX8OM+MVz+4sMaOaD9HRmeFJRb8nzMdYkx0= -github.com/aws/aws-sdk-go-v2/service/ec2 v1.202.2/go.mod h1:2omfxRebtpbbFqQGqeurDzlyB7Txa2e1xe9rCDFqlwA= -github.com/aws/aws-sdk-go-v2/service/eks v1.57.2 h1:Uxm6iUIEaRtyvcp8Gj45viJmM2KksMLNBRCd8DBxuJA= -github.com/aws/aws-sdk-go-v2/service/eks v1.57.2/go.mod h1:qpBx8an26dxeAoEMlHAjGkCzrYtFF1KsYycmvgSeIfU= -github.com/aws/aws-sdk-go-v2/service/fis v1.31.9 h1:Fsg7DBqm7WpC/w9MLqu9RikgsaEHv7JUe0Le99AZ3rA= -github.com/aws/aws-sdk-go-v2/service/fis v1.31.9/go.mod h1:ilhWDnlNDbCmkyVkfHasUwURSDZkPDFBsg0/BeIACvA= -github.com/aws/aws-sdk-go-v2/service/iam v1.38.10 h1:u/MwkFwRkKRDvy7D76/khJTk8HMp4mC5sZKErU53jos= -github.com/aws/aws-sdk-go-v2/service/iam v1.38.10/go.mod h1:Gid0WEVky3EWbkeXiS67kHhbiK+q3/wO/hvPh7plR0c= +github.com/aws/aws-sdk-go-v2/service/ec2 v1.202.4 h1:gdFRXlTMgV0+yrhQLAJKb+vX2K32Vw3n2TntDd+8AEM= +github.com/aws/aws-sdk-go-v2/service/ec2 v1.202.4/go.mod h1:nSbxgPGhyI9j/cMVSHUEEtNQzEYeNOkbHnHNeTuQqt0= +github.com/aws/aws-sdk-go-v2/service/eks v1.58.0 h1:CQn77jEQBLKtHXkiCN58IcrG1jj4w1EwhXRh+NeNhHc= +github.com/aws/aws-sdk-go-v2/service/eks v1.58.0/go.mod h1:N42HjGBTjTjcJolSqcG1s10xfeNTbAeLWI600lHgwIg= +github.com/aws/aws-sdk-go-v2/service/fis v1.31.10 h1:v6P5IjwQAcvYl2lBhd/4Kkgxy7uHFOjS6rgB0A8qvj8= +github.com/aws/aws-sdk-go-v2/service/fis v1.31.10/go.mod h1:mamWv1A0OkDhWINhI7UI0jAhxsq4hZNPE+eu7mp6C7Y= +github.com/aws/aws-sdk-go-v2/service/iam v1.39.1 h1:N4OauekXigX0GgsJ+FUm7OO5HkrJR0ByZJ2YS5PIy3U= +github.com/aws/aws-sdk-go-v2/service/iam v1.39.1/go.mod h1:8rUmP3N5TJXWWEzdQ+2Tc1IELc97pxBt5Zbt4QLq7KI= github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.12.2 h1:D4oz8/CzT9bAEYtVhSBmFj2dNOtaHOtMKc2vHBwYizA= github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.12.2/go.mod h1:Za3IHqTQ+yNcRHxu1OFucBh0ACZT4j4VQFF0BqpZcLY= -github.com/aws/aws-sdk-go-v2/service/internal/endpoint-discovery v1.10.12 h1:V1h3Cxmn0tN5EhL31uvqSLKsMlPlqiYxRwAEdwNeIJ8= -github.com/aws/aws-sdk-go-v2/service/internal/endpoint-discovery v1.10.12/go.mod h1:KzXJPn2wqsZJlNSx70gmDkRDVTmyF/RRXxTP2yMxUwc= -github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.12.12 h1:O+8vD2rGjfihBewr5bT+QUfYUHIxCVgG61LHoT59shM= -github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.12.12/go.mod h1:usVdWJaosa66NMvmCrr08NcWDBRv4E6+YFG2pUdw1Lk= -github.com/aws/aws-sdk-go-v2/service/pricing v1.32.14 h1:YajuqS3CsPEllD8NZbVzMFdmgLQfTPSTrs+H1nLRZks= -github.com/aws/aws-sdk-go-v2/service/pricing v1.32.14/go.mod h1:LfN59L0VQPjqwfeqiESbI0B4Vd3DYLFIcNUpcijGnkA= -github.com/aws/aws-sdk-go-v2/service/sqs v1.37.12 h1:8TMY/uvatjnLqllJhW0WOfAQSdLQl525yuaA0Uq1ejk= -github.com/aws/aws-sdk-go-v2/service/sqs v1.37.12/go.mod h1:LG6s2xJm3K9X9ee5EmYyOveXOgVK4jtunBJBXFJ2TqE= -github.com/aws/aws-sdk-go-v2/service/ssm v1.56.10 h1:GLRZnZtAxWIgROsRgVm8YPaAG0t9pUwaxrkda/g9JiU= -github.com/aws/aws-sdk-go-v2/service/ssm v1.56.10/go.mod h1:kh7898L3bN432TMBiRBe5Ua4IrUAaq1LwHhbqabeOOk= -github.com/aws/aws-sdk-go-v2/service/sso v1.24.14 h1:c5WJ3iHz7rLIgArznb3JCSQT3uUMiz9DLZhIX+1G8ok= -github.com/aws/aws-sdk-go-v2/service/sso v1.24.14/go.mod h1:+JJQTxB6N4niArC14YNtxcQtwEqzS3o9Z32n7q33Rfs= -github.com/aws/aws-sdk-go-v2/service/ssooidc v1.28.13 h1:f1L/JtUkVODD+k1+IiSJUUv8A++2qVr+Xvb3xWXETMU= -github.com/aws/aws-sdk-go-v2/service/ssooidc v1.28.13/go.mod h1:tvqlFoja8/s0o+UruA1Nrezo/df0PzdunMDDurUfg6U= -github.com/aws/aws-sdk-go-v2/service/sts v1.33.12 h1:fqg6c1KVrc3SYWma/egWue5rKI4G2+M4wMQN2JosNAA= -github.com/aws/aws-sdk-go-v2/service/sts v1.33.12/go.mod h1:7Yn+p66q/jt38qMoVfNvjbm3D89mGBnkwDcijgtih8w= -github.com/aws/aws-sdk-go-v2/service/timestreamwrite v1.29.15 h1:2oGJG96TsCmt8d5/2B62sxzwbxTj5UpXztPWOA2Nki4= -github.com/aws/aws-sdk-go-v2/service/timestreamwrite v1.29.15/go.mod h1:GdO5LNWmaQaT0drv+xf4omi53vy4GrzjME0X7TgRMJk= +github.com/aws/aws-sdk-go-v2/service/internal/endpoint-discovery v1.10.13 h1:eWoHfLIzYeUtJEuoUmD5PwTE+fLaIPN9NZ7UXd9CW0s= +github.com/aws/aws-sdk-go-v2/service/internal/endpoint-discovery v1.10.13/go.mod h1:x5t8Ve0J7JK9VHKSPSRAdBrWAgr/5hH3UeCFMLoyUGQ= +github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.12.13 h1:SYVGSFQHlchIcy6e7x12bsrxClCXSP5et8cqVhL8cuw= +github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.12.13/go.mod h1:kizuDaLX37bG5WZaoxGPQR/LNFXpxp0vsUnqfkWXfNE= +github.com/aws/aws-sdk-go-v2/service/pricing v1.32.16 h1:V6lgrFRz1B7+OE6NUMrccUBVSiSF0B4uwkldeWAGvnU= +github.com/aws/aws-sdk-go-v2/service/pricing v1.32.16/go.mod h1:27xFxqZ5sSWdgfXEM8ixtw0qApX2bjsHNiJMbHwNDhc= +github.com/aws/aws-sdk-go-v2/service/sqs v1.37.14 h1:KSVbQW2umLp7i4Lo6mvBUz5PqV+Ze/IL6LCTasxQWEk= +github.com/aws/aws-sdk-go-v2/service/sqs v1.37.14/go.mod h1:jiaEkIw2Bb6IsoY9PDAZqVXJjNaKSxQGGj10CiloDWU= +github.com/aws/aws-sdk-go-v2/service/ssm v1.56.12 h1:EKEY56SQTqEsOuh68B8YVqmsLJ1nuwUGYyKImyo+0ug= +github.com/aws/aws-sdk-go-v2/service/ssm v1.56.12/go.mod h1:I/j1db6MPxBp7vcVrRAh+u+vERu79MWoyhoSjRaDl9E= +github.com/aws/aws-sdk-go-v2/service/sso v1.24.15 h1:/eE3DogBjYlvlbhd2ssWyeuovWunHLxfgw3s/OJa4GQ= +github.com/aws/aws-sdk-go-v2/service/sso v1.24.15/go.mod h1:2PCJYpi7EKeA5SkStAmZlF6fi0uUABuhtF8ILHjGc3Y= +github.com/aws/aws-sdk-go-v2/service/ssooidc v1.28.14 h1:M/zwXiL2iXUrHputuXgmO94TVNmcenPHxgLXLutodKE= +github.com/aws/aws-sdk-go-v2/service/ssooidc v1.28.14/go.mod h1:RVwIw3y/IqxC2YEXSIkAzRDdEU1iRabDPaYjpGCbCGQ= +github.com/aws/aws-sdk-go-v2/service/sts v1.33.14 h1:TzeR06UCMUq+KA3bDkujxK1GVGy+G8qQN/QVYzGLkQE= +github.com/aws/aws-sdk-go-v2/service/sts v1.33.14/go.mod h1:dspXf/oYWGWo6DEvj98wpaTeqt5+DMidZD0A9BYTizc= +github.com/aws/aws-sdk-go-v2/service/timestreamwrite v1.29.16 h1:A6oLifvrpiy020lUUV38xEbAquPHgqRfrtlqleWKYlo= +github.com/aws/aws-sdk-go-v2/service/timestreamwrite v1.29.16/go.mod h1:pFiao5K15XNf+tdIBEC7UBv/+mX0AJRJbjXyp16zckA= github.com/aws/karpenter-provider-aws/tools/kompat v0.0.0-20240410220356-6b868db24881 h1:m9rhsGhdepdQV96tZgfy68oU75AWAjOH8u65OefTjwA= github.com/aws/karpenter-provider-aws/tools/kompat v0.0.0-20240410220356-6b868db24881/go.mod h1:+Mk5k0b6HpKobxNq+B56DOhZ+I/NiPhd5MIBhQMSTSs= github.com/aws/smithy-go v1.22.2 h1:6D9hW43xKFrRx/tXXfAlIZc4JI+yQe6snnWcQyxSyLQ= @@ -246,8 +246,9 @@ golang.org/x/sync v0.1.0/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sync v0.3.0/go.mod h1:FU7BRWz2tNW+3quACPkgCx/L+uEAv1htQ0V83Z9Rj+Y= golang.org/x/sync v0.6.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk= golang.org/x/sync v0.7.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk= -golang.org/x/sync v0.10.0 h1:3NQrjDixjgGwUOCaF8w2+VYHv0Ve/vGYSbdkTa98gmQ= golang.org/x/sync v0.10.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk= +golang.org/x/sync v0.11.0 h1:GGz8+XQP4FvTTrjZPzNKTMFtSXH80RAzG+5ghFPgK9w= +golang.org/x/sync v0.11.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk= golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= golang.org/x/sys v0.0.0-20190412213103-97732733099d/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20200930185726-fdedc70b468f/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= From 448c0449577d8df87aedc1519eafe59b56c44f6f Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 10 Feb 2025 15:44:19 -0800 Subject: [PATCH 25/34] chore(deps): bump aws-actions/configure-aws-credentials from 4.0.2 to 4.0.3 in /.github/actions/e2e/upgrade-crds in the action-deps group (#7715) Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- .github/actions/e2e/upgrade-crds/action.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/actions/e2e/upgrade-crds/action.yaml b/.github/actions/e2e/upgrade-crds/action.yaml index cf35813ed3f8..5ce8e602a60f 100644 --- a/.github/actions/e2e/upgrade-crds/action.yaml +++ b/.github/actions/e2e/upgrade-crds/action.yaml @@ -19,7 +19,7 @@ runs: using: "composite" steps: - name: configure aws credentials - uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2 + uses: aws-actions/configure-aws-credentials@4fc4975a852c8cd99761e2de1f4ba73402e44dd9 # v4.0.3 with: role-to-assume: arn:aws:iam::${{ inputs.account_id }}:role/${{ inputs.role }} aws-region: ${{ inputs.region }} From 8753de3795881e7d98dba3b0c0818d0727e0385d Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Mon, 10 Feb 2025 23:49:05 +0000 Subject: [PATCH 26/34] chore: Update data from AWS APIs (#7721) Co-authored-by: APICodeGen --- pkg/providers/instancetype/zz_generated.bandwidth.go | 1 + pkg/providers/pricing/zz_generated.pricing_aws.go | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/pkg/providers/instancetype/zz_generated.bandwidth.go b/pkg/providers/instancetype/zz_generated.bandwidth.go index ae43e7c6bd80..36e64e92b17e 100644 --- a/pkg/providers/instancetype/zz_generated.bandwidth.go +++ b/pkg/providers/instancetype/zz_generated.bandwidth.go @@ -480,6 +480,7 @@ var ( "c7i.8xlarge": 12500, "d3.4xlarge": 12500, "d3en.2xlarge": 12500, + "f2.6xlarge": 12500, "i3en.3xlarge": 12500, "i7ie.3xlarge": 12500, "i7ie.6xlarge": 12500, diff --git a/pkg/providers/pricing/zz_generated.pricing_aws.go b/pkg/providers/pricing/zz_generated.pricing_aws.go index 03cf7359d330..c8a0d7a44db2 100644 --- a/pkg/providers/pricing/zz_generated.pricing_aws.go +++ b/pkg/providers/pricing/zz_generated.pricing_aws.go @@ -16,7 +16,7 @@ limitations under the License. package pricing -// generated at 2025-01-20T13:09:39Z for us-east-1 +// generated at 2025-02-10T13:11:17Z for us-east-1 import ec2types "github.com/aws/aws-sdk-go-v2/service/ec2/types" @@ -120,7 +120,7 @@ var InitialOnDemandPricesAWS = map[string]map[ec2types.InstanceType]float64{ // f1 family "f1.16xlarge": 13.200000, "f1.2xlarge": 1.650000, "f1.4xlarge": 3.300000, // f2 family - "f2.12xlarge": 3.960000, "f2.48xlarge": 15.840000, + "f2.12xlarge": 3.960000, "f2.48xlarge": 15.840000, "f2.6xlarge": 1.980000, // g2 family "g2.2xlarge": 0.650000, "g2.8xlarge": 2.600000, // g3 family From 73e12121f5731fea8cc813dcaf75f39205ffafba Mon Sep 17 00:00:00 2001 From: doryer <94246801+doryer@users.noreply.github.com> Date: Tue, 11 Feb 2025 01:50:25 +0200 Subject: [PATCH 27/34] Add Coralogix as adopters (#7714) --- ADOPTERS.md | 1 + 1 file changed, 1 insertion(+) diff --git a/ADOPTERS.md b/ADOPTERS.md index 523d804467d5..a12b9224bd48 100644 --- a/ADOPTERS.md +++ b/ADOPTERS.md @@ -25,6 +25,7 @@ If you are open to others contacting you about your use of Karpenter on Slack, a | Cloud Posse, LLC | Karpenter ships out-of-the-box in our Terraform Blueprint for EKS and is offered as part of our comprehensive multi-account [AWS reference architecture](https://cloudposse.com/reference-architecture/). Everything is Open Source (APACHE2). | `@osterman` | [Karpenter : The Cloud Posse Developer Hub](https://docs.cloudposse.com/components/catalog/aws/eks/karpenter/) | | Codefresh | Juggling workloads for the SAAS CD/GitOps offering | `@Yonatan Koren`, `@Ilia Medvedev` | [Codefresh](https://codefresh.io/) | | Conveyor | Using karpenter to scale our customers data pipelines on EKS | `@stijndehaes` | [Conveyor](https://conveyordata.com/) | +| Coralogix | Using Karpenter on all of our EKS K8S Clusters in multi-tenant high scale production environments | `@doryer` | [Coralogix](https://coralogix.com/) | | Cordial | Using Karpenter to scale multiple EKS clusters quickly | `@dschaaff` | [Cordial](https://cordial.com) | | Dig Security | Protecting our customers data - Using Karpenter to manage production and development workloads on EKS, We are using only Spot Instances in production. | `@Shahar Danus` | [Dig Security](https://dig.security/) | | Docker | Using Karpenter to scale Docker Hub on our EKS clusters | N/A | [Docker](https://www.docker.com) | From 344301eebf460a15a0fe1946e427ce2654310e3b Mon Sep 17 00:00:00 2001 From: Vojtech Splichal Date: Tue, 11 Feb 2025 00:50:38 +0100 Subject: [PATCH 28/34] chore: helm - add servicemonitor api condition (#7695) Co-authored-by: Jason Deal --- charts/karpenter/templates/servicemonitor.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/charts/karpenter/templates/servicemonitor.yaml b/charts/karpenter/templates/servicemonitor.yaml index 70f07608fb2c..81997bf7d5f8 100644 --- a/charts/karpenter/templates/servicemonitor.yaml +++ b/charts/karpenter/templates/servicemonitor.yaml @@ -1,4 +1,4 @@ -{{- if.Values.serviceMonitor.enabled -}} +{{- if and .Values.serviceMonitor.enabled (.Capabilities.APIVersions.Has "monitoring.coreos.com/v1") -}} apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: From d07ba3f299cc2ec32f248474e0814db1d2d8f96a Mon Sep 17 00:00:00 2001 From: Michal Schott Date: Tue, 11 Feb 2025 00:51:06 +0100 Subject: [PATCH 29/34] Restrict auto-mount of service account token in service account (#7606) --- charts/karpenter/templates/deployment.yaml | 1 + charts/karpenter/templates/serviceaccount.yaml | 1 + 2 files changed, 2 insertions(+) diff --git a/charts/karpenter/templates/deployment.yaml b/charts/karpenter/templates/deployment.yaml index 0c0e69f75e42..edd2aea0429a 100644 --- a/charts/karpenter/templates/deployment.yaml +++ b/charts/karpenter/templates/deployment.yaml @@ -35,6 +35,7 @@ spec: imagePullSecrets: {{- toYaml . | nindent 8 }} {{- end }} + automountServiceAccountToken: true serviceAccountName: {{ include "karpenter.serviceAccountName" . }} {{- with .Values.podSecurityContext }} securityContext: diff --git a/charts/karpenter/templates/serviceaccount.yaml b/charts/karpenter/templates/serviceaccount.yaml index 0141afc29ebf..f23be1d2d226 100644 --- a/charts/karpenter/templates/serviceaccount.yaml +++ b/charts/karpenter/templates/serviceaccount.yaml @@ -16,3 +16,4 @@ metadata: {{- end }} {{- end }} {{- end -}} +automountServiceAccountToken: false From edb26beccca361010522ca704ac72840698d9c7c Mon Sep 17 00:00:00 2001 From: Jonathan Innis Date: Mon, 10 Feb 2025 17:35:06 -0800 Subject: [PATCH 30/34] chore: Fix automount service account token (#7724) --- charts/karpenter/templates/serviceaccount.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/charts/karpenter/templates/serviceaccount.yaml b/charts/karpenter/templates/serviceaccount.yaml index f23be1d2d226..fce93d13c0df 100644 --- a/charts/karpenter/templates/serviceaccount.yaml +++ b/charts/karpenter/templates/serviceaccount.yaml @@ -15,5 +15,5 @@ metadata: {{- toYaml . | nindent 4 }} {{- end }} {{- end }} -{{- end -}} automountServiceAccountToken: false +{{- end -}} From 5fc2edde2b3e5bc89c667071b83f95ae9d545d61 Mon Sep 17 00:00:00 2001 From: edibble21 <85638465+edibble21@users.noreply.github.com> Date: Tue, 11 Feb 2025 16:16:31 -0800 Subject: [PATCH 31/34] Validation controller permission error check for Nodeclass (#7624) --- .../templates/karpenter.sh_nodeclaims.yaml | 2 +- .../templates/karpenter.sh_nodepools.yaml | 2 +- cmd/controller/main.go | 1 + go.mod | 8 +- go.sum | 11 +- pkg/apis/crds/karpenter.sh_nodeclaims.yaml | 2 +- pkg/apis/crds/karpenter.sh_nodepools.yaml | 2 +- pkg/aws/sdk.go | 1 + pkg/cloudprovider/cloudprovider.go | 22 +-- pkg/cloudprovider/suite_test.go | 6 +- pkg/controllers/controllers.go | 4 +- pkg/controllers/nodeclass/controller.go | 5 +- pkg/controllers/nodeclass/readiness_test.go | 2 +- pkg/controllers/nodeclass/suite_test.go | 1 + pkg/controllers/nodeclass/validation.go | 158 +++++++++++++++- pkg/controllers/nodeclass/validation_test.go | 126 ++++++++----- pkg/errors/errors.go | 38 ++++ pkg/fake/ec2api.go | 87 +++++++-- pkg/fake/types.go | 1 + pkg/operator/operator.go | 2 + pkg/providers/instance/instance.go | 34 ++-- pkg/providers/instancetype/suite_test.go | 24 +-- .../launchtemplate/launchtemplate.go | 38 ++-- pkg/providers/launchtemplate/suite_test.go | 168 +++++++++--------- pkg/utils/utils.go | 23 +++ test/suites/ami/suite_test.go | 3 +- test/suites/integration/nodeclass_test.go | 114 ++++++++++++ .../suites/integration/security_group_test.go | 2 +- test/suites/integration/subnet_test.go | 2 +- 29 files changed, 658 insertions(+), 231 deletions(-) create mode 100644 test/suites/integration/nodeclass_test.go diff --git a/charts/karpenter-crd/templates/karpenter.sh_nodeclaims.yaml b/charts/karpenter-crd/templates/karpenter.sh_nodeclaims.yaml index b1cf4443804b..e0a0184f30d7 100644 --- a/charts/karpenter-crd/templates/karpenter.sh_nodeclaims.yaml +++ b/charts/karpenter-crd/templates/karpenter.sh_nodeclaims.yaml @@ -6,7 +6,7 @@ metadata: {{- with .Values.additionalAnnotations }} {{- toYaml . | nindent 4 }} {{- end }} - controller-gen.kubebuilder.io/version: v0.17.1 + controller-gen.kubebuilder.io/version: v0.17.2 name: nodeclaims.karpenter.sh spec: group: karpenter.sh diff --git a/charts/karpenter-crd/templates/karpenter.sh_nodepools.yaml b/charts/karpenter-crd/templates/karpenter.sh_nodepools.yaml index 03b03e6add5e..dac45d671119 100644 --- a/charts/karpenter-crd/templates/karpenter.sh_nodepools.yaml +++ b/charts/karpenter-crd/templates/karpenter.sh_nodepools.yaml @@ -6,7 +6,7 @@ metadata: {{- with .Values.additionalAnnotations }} {{- toYaml . | nindent 4 }} {{- end }} - controller-gen.kubebuilder.io/version: v0.17.1 + controller-gen.kubebuilder.io/version: v0.17.2 name: nodepools.karpenter.sh spec: group: karpenter.sh diff --git a/cmd/controller/main.go b/cmd/controller/main.go index 8dd479f84b41..52075ecd984b 100644 --- a/cmd/controller/main.go +++ b/cmd/controller/main.go @@ -54,6 +54,7 @@ func main() { op.Manager, op.Config, op.Clock, + op.EC2API, op.GetClient(), op.EventRecorder, op.UnavailableOfferingsCache, diff --git a/go.mod b/go.mod index d2f709787038..3937b354dadb 100644 --- a/go.mod +++ b/go.mod @@ -24,6 +24,7 @@ require ( github.com/awslabs/amazon-eks-ami/nodeadm v0.0.0-20240229193347-cfab22a10647 github.com/awslabs/operatorpkg v0.0.0-20241205163410-0fff9f28d115 github.com/go-logr/zapr v1.3.0 + github.com/google/uuid v1.6.0 github.com/imdario/mergo v0.3.16 github.com/jonathan-innis/aws-sdk-go-prometheus v0.1.1 github.com/mitchellh/hashstructure/v2 v2.0.2 @@ -43,7 +44,7 @@ require ( k8s.io/klog/v2 v2.130.1 k8s.io/utils v0.0.0-20241104100929-3ea5e8cea738 sigs.k8s.io/controller-runtime v0.20.1 - sigs.k8s.io/karpenter v1.2.1-0.20250208015555-8e8b99d6bfa2 + sigs.k8s.io/karpenter v1.2.1-0.20250211002957-aa118786c83c sigs.k8s.io/yaml v1.4.0 ) @@ -79,7 +80,6 @@ require ( github.com/google/go-cmp v0.6.0 // indirect github.com/google/gofuzz v1.2.0 // indirect github.com/google/pprof v0.0.0-20241210010833-40e02aabc2ad // indirect - github.com/google/uuid v1.6.0 // indirect github.com/inconshreveable/mousetrap v1.1.0 // indirect github.com/josharian/intern v1.0.0 // indirect github.com/json-iterator/go v1.1.12 // indirect @@ -104,8 +104,8 @@ require ( golang.org/x/oauth2 v0.23.0 // indirect golang.org/x/sys v0.28.0 // indirect golang.org/x/term v0.27.0 // indirect - golang.org/x/text v0.21.0 // indirect - golang.org/x/time v0.9.0 // indirect + golang.org/x/text v0.22.0 // indirect + golang.org/x/time v0.10.0 // indirect golang.org/x/tools v0.28.0 // indirect gomodules.xyz/jsonpatch/v2 v2.4.0 // indirect google.golang.org/protobuf v1.36.1 // indirect diff --git a/go.sum b/go.sum index ec3a5e2a9f78..e855f5e50fcd 100644 --- a/go.sum +++ b/go.sum @@ -281,10 +281,11 @@ golang.org/x/text v0.9.0/go.mod h1:e1OnstbJyHTd6l/uOt8jFFHp6TRDWZR/bV3emEE/zU8= golang.org/x/text v0.13.0/go.mod h1:TvPlkZtksWOMsz7fbANvkp4WM8x/WCo/om8BMLbz+aE= golang.org/x/text v0.14.0/go.mod h1:18ZOQIKpY8NJVqYksKHtTdi31H5itFRjB5/qKTNYzSU= golang.org/x/text v0.15.0/go.mod h1:18ZOQIKpY8NJVqYksKHtTdi31H5itFRjB5/qKTNYzSU= -golang.org/x/text v0.21.0 h1:zyQAAkrwaneQ066sspRyJaG9VNi/YJ1NfzcGB3hZ/qo= golang.org/x/text v0.21.0/go.mod h1:4IBbMaMmOPCJ8SecivzSH54+73PCFmPWxNTLm+vZkEQ= -golang.org/x/time v0.9.0 h1:EsRrnYcQiGH+5FfbgvV4AP7qEZstoyrHB0DzarOQ4ZY= -golang.org/x/time v0.9.0/go.mod h1:3BpzKBy/shNhVucY/MWOyx10tF3SFh9QdLuxbVysPQM= +golang.org/x/text v0.22.0 h1:bofq7m3/HAFvbF51jz3Q9wLg3jkvSPuiZu/pD1XwgtM= +golang.org/x/text v0.22.0/go.mod h1:YRoo4H8PVmsu+E3Ou7cqLVH8oXWIHVoX0jqUWALQhfY= +golang.org/x/time v0.10.0 h1:3usCWA8tQn0L8+hFJQNgzpWbd89begxN66o1Ojdn5L4= +golang.org/x/time v0.10.0/go.mod h1:3BpzKBy/shNhVucY/MWOyx10tF3SFh9QdLuxbVysPQM= golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ= golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo= golang.org/x/tools v0.0.0-20200619180055-7c47624df98f/go.mod h1:EkVYQZoAsY45+roYkvgYkIh4xh/qjgUK9TdY2XT94GE= @@ -337,8 +338,8 @@ sigs.k8s.io/controller-runtime v0.20.1 h1:JbGMAG/X94NeM3xvjenVUaBjy6Ui4Ogd/J5Ztj sigs.k8s.io/controller-runtime v0.20.1/go.mod h1:BrP3w158MwvB3ZbNpaAcIKkHQ7YGpYnzpoSTZ8E14WU= sigs.k8s.io/json v0.0.0-20241010143419-9aa6b5e7a4b3 h1:/Rv+M11QRah1itp8VhT6HoVx1Ray9eB4DBr+K+/sCJ8= sigs.k8s.io/json v0.0.0-20241010143419-9aa6b5e7a4b3/go.mod h1:18nIHnGi6636UCz6m8i4DhaJ65T6EruyzmoQqI2BVDo= -sigs.k8s.io/karpenter v1.2.1-0.20250208015555-8e8b99d6bfa2 h1:E8ZbRdDrRfAaNgLgOl3qkBGMyKOoDTb7grYEwV6+FBQ= -sigs.k8s.io/karpenter v1.2.1-0.20250208015555-8e8b99d6bfa2/go.mod h1:S+qNY3XwugJTu+UvgAdeNUxWuwQP/gS0uefdrV5wFLE= +sigs.k8s.io/karpenter v1.2.1-0.20250211002957-aa118786c83c h1:1FsZR40Lx9lTINMRCmgi+BdnHVWWhmfwxFq1RfcCArY= +sigs.k8s.io/karpenter v1.2.1-0.20250211002957-aa118786c83c/go.mod h1:R6cr2+SbbgXtKtiuyRFdZCbqWN2kNTduqshnQRoyOr8= sigs.k8s.io/structured-merge-diff/v4 v4.4.2 h1:MdmvkGuXi/8io6ixD5wud3vOLwc1rj0aNqRlpuvjmwA= sigs.k8s.io/structured-merge-diff/v4 v4.4.2/go.mod h1:N8f93tFZh9U6vpxwRArLiikrE5/2tiu1w1AGfACIGE4= sigs.k8s.io/yaml v1.4.0 h1:Mk1wCc2gy/F0THH0TAp1QYyJNzRm2KCLy3o5ASXVI5E= diff --git a/pkg/apis/crds/karpenter.sh_nodeclaims.yaml b/pkg/apis/crds/karpenter.sh_nodeclaims.yaml index e5b4aeae319a..e255d9c894fc 100644 --- a/pkg/apis/crds/karpenter.sh_nodeclaims.yaml +++ b/pkg/apis/crds/karpenter.sh_nodeclaims.yaml @@ -3,7 +3,7 @@ apiVersion: apiextensions.k8s.io/v1 kind: CustomResourceDefinition metadata: annotations: - controller-gen.kubebuilder.io/version: v0.17.1 + controller-gen.kubebuilder.io/version: v0.17.2 name: nodeclaims.karpenter.sh spec: group: karpenter.sh diff --git a/pkg/apis/crds/karpenter.sh_nodepools.yaml b/pkg/apis/crds/karpenter.sh_nodepools.yaml index 06e2327566cd..155bd626c067 100644 --- a/pkg/apis/crds/karpenter.sh_nodepools.yaml +++ b/pkg/apis/crds/karpenter.sh_nodepools.yaml @@ -3,7 +3,7 @@ apiVersion: apiextensions.k8s.io/v1 kind: CustomResourceDefinition metadata: annotations: - controller-gen.kubebuilder.io/version: v0.17.1 + controller-gen.kubebuilder.io/version: v0.17.2 name: nodepools.karpenter.sh spec: group: karpenter.sh diff --git a/pkg/aws/sdk.go b/pkg/aws/sdk.go index e00d3d2cd509..b6449125714b 100644 --- a/pkg/aws/sdk.go +++ b/pkg/aws/sdk.go @@ -37,6 +37,7 @@ type EC2API interface { CreateFleet(context.Context, *ec2.CreateFleetInput, ...func(*ec2.Options)) (*ec2.CreateFleetOutput, error) TerminateInstances(context.Context, *ec2.TerminateInstancesInput, ...func(*ec2.Options)) (*ec2.TerminateInstancesOutput, error) DescribeInstances(context.Context, *ec2.DescribeInstancesInput, ...func(*ec2.Options)) (*ec2.DescribeInstancesOutput, error) + RunInstances(context.Context, *ec2.RunInstancesInput, ...func(*ec2.Options)) (*ec2.RunInstancesOutput, error) CreateTags(context.Context, *ec2.CreateTagsInput, ...func(*ec2.Options)) (*ec2.CreateTagsOutput, error) CreateLaunchTemplate(context.Context, *ec2.CreateLaunchTemplateInput, ...func(*ec2.Options)) (*ec2.CreateLaunchTemplateOutput, error) DeleteLaunchTemplate(context.Context, *ec2.DeleteLaunchTemplateInput, ...func(*ec2.Options)) (*ec2.DeleteLaunchTemplateOutput, error) diff --git a/pkg/cloudprovider/cloudprovider.go b/pkg/cloudprovider/cloudprovider.go index 396022cab007..6bb838cd67e5 100644 --- a/pkg/cloudprovider/cloudprovider.go +++ b/pkg/cloudprovider/cloudprovider.go @@ -103,7 +103,7 @@ func (c *CloudProvider) Create(ctx context.Context, nodeClaim *karpv1.NodeClaim) if len(instanceTypes) == 0 { return nil, cloudprovider.NewInsufficientCapacityError(fmt.Errorf("all requested instance types were unavailable during launch")) } - tags, err := getTags(ctx, nodeClass, nodeClaim) + tags, err := utils.GetTags(nodeClass, nodeClaim, options.FromContext(ctx).ClusterName) if err != nil { return nil, cloudprovider.NewNodeClassNotReadyError(err) } @@ -232,26 +232,6 @@ func (c *CloudProvider) GetSupportedNodeClasses() []status.Object { return []status.Object{&v1.EC2NodeClass{}} } -func getTags(ctx context.Context, nodeClass *v1.EC2NodeClass, nodeClaim *karpv1.NodeClaim) (map[string]string, error) { - if offendingTag, found := lo.FindKeyBy(nodeClass.Spec.Tags, func(k string, v string) bool { - for _, exp := range v1.RestrictedTagPatterns { - if exp.MatchString(k) { - return true - } - } - return false - }); found { - return nil, fmt.Errorf("%q tag does not pass tag validation requirements", offendingTag) - } - staticTags := map[string]string{ - fmt.Sprintf("kubernetes.io/cluster/%s", options.FromContext(ctx).ClusterName): "owned", - karpv1.NodePoolLabelKey: nodeClaim.Labels[karpv1.NodePoolLabelKey], - v1.EKSClusterNameTagKey: options.FromContext(ctx).ClusterName, - v1.LabelNodeClass: nodeClass.Name, - } - return lo.Assign(nodeClass.Spec.Tags, staticTags), nil -} - func (c *CloudProvider) RepairPolicies() []cloudprovider.RepairPolicy { return []cloudprovider.RepairPolicy{ // Supported Kubelet Node Conditions diff --git a/pkg/cloudprovider/suite_test.go b/pkg/cloudprovider/suite_test.go index 080f308bb4f0..9e055ee0f194 100644 --- a/pkg/cloudprovider/suite_test.go +++ b/pkg/cloudprovider/suite_test.go @@ -1158,7 +1158,7 @@ var _ = Describe("CloudProvider", func() { {SubnetId: aws.String("test-subnet-2"), AvailabilityZone: aws.String("test-zone-1a"), AvailabilityZoneId: aws.String("tstz1-1a"), AvailableIpAddressCount: aws.Int32(100), Tags: []ec2types.Tag{{Key: aws.String("Name"), Value: aws.String("test-subnet-2")}}}, }}) - controller := nodeclass.NewController(env.Client, recorder, awsEnv.SubnetProvider, awsEnv.SecurityGroupProvider, awsEnv.AMIProvider, awsEnv.InstanceProfileProvider, awsEnv.LaunchTemplateProvider) + controller := nodeclass.NewController(env.Client, recorder, awsEnv.SubnetProvider, awsEnv.SecurityGroupProvider, awsEnv.AMIProvider, awsEnv.InstanceProfileProvider, awsEnv.LaunchTemplateProvider, awsEnv.EC2API) ExpectApplied(ctx, env.Client, nodePool, nodeClass) ExpectObjectReconciled(ctx, env.Client, controller, nodeClass) pod := coretest.UnschedulablePod(coretest.PodOptions{NodeSelector: map[string]string{corev1.LabelTopologyZone: "test-zone-1a"}}) @@ -1175,7 +1175,7 @@ var _ = Describe("CloudProvider", func() { {SubnetId: aws.String("test-subnet-2"), AvailabilityZone: aws.String("test-zone-1a"), AvailabilityZoneId: aws.String("tstz1-1a"), AvailableIpAddressCount: aws.Int32(11), Tags: []ec2types.Tag{{Key: aws.String("Name"), Value: aws.String("test-subnet-2")}}}, }}) - controller := nodeclass.NewController(env.Client, recorder, awsEnv.SubnetProvider, awsEnv.SecurityGroupProvider, awsEnv.AMIProvider, awsEnv.InstanceProfileProvider, awsEnv.LaunchTemplateProvider) + controller := nodeclass.NewController(env.Client, recorder, awsEnv.SubnetProvider, awsEnv.SecurityGroupProvider, awsEnv.AMIProvider, awsEnv.InstanceProfileProvider, awsEnv.LaunchTemplateProvider, awsEnv.EC2API) nodeClass.Spec.Kubelet = &v1.KubeletConfiguration{ MaxPods: aws.Int32(1), } @@ -1216,7 +1216,7 @@ var _ = Describe("CloudProvider", func() { }}) nodeClass.Spec.SubnetSelectorTerms = []v1.SubnetSelectorTerm{{Tags: map[string]string{"Name": "test-subnet-1"}}} ExpectApplied(ctx, env.Client, nodePool, nodeClass) - controller := nodeclass.NewController(env.Client, recorder, awsEnv.SubnetProvider, awsEnv.SecurityGroupProvider, awsEnv.AMIProvider, awsEnv.InstanceProfileProvider, awsEnv.LaunchTemplateProvider) + controller := nodeclass.NewController(env.Client, recorder, awsEnv.SubnetProvider, awsEnv.SecurityGroupProvider, awsEnv.AMIProvider, awsEnv.InstanceProfileProvider, awsEnv.LaunchTemplateProvider, awsEnv.EC2API) ExpectObjectReconciled(ctx, env.Client, controller, nodeClass) podSubnet1 := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, podSubnet1) diff --git a/pkg/controllers/controllers.go b/pkg/controllers/controllers.go index 1d2d80dfb7da..a9de8ba97fa9 100644 --- a/pkg/controllers/controllers.go +++ b/pkg/controllers/controllers.go @@ -26,6 +26,7 @@ import ( "github.com/aws/aws-sdk-go-v2/aws" v1 "github.com/aws/karpenter-provider-aws/pkg/apis/v1" + sdk "github.com/aws/karpenter-provider-aws/pkg/aws" nodeclass "github.com/aws/karpenter-provider-aws/pkg/controllers/nodeclass" nodeclasshash "github.com/aws/karpenter-provider-aws/pkg/controllers/nodeclass/hash" controllersinstancetype "github.com/aws/karpenter-provider-aws/pkg/controllers/providers/instancetype" @@ -63,6 +64,7 @@ func NewControllers( mgr manager.Manager, cfg aws.Config, clk clock.Clock, + ec2api sdk.EC2API, kubeClient client.Client, recorder events.Recorder, unavailableOfferings *awscache.UnavailableOfferings, @@ -79,7 +81,7 @@ func NewControllers( instanceTypeProvider *instancetype.DefaultProvider) []controller.Controller { controllers := []controller.Controller{ nodeclasshash.NewController(kubeClient), - nodeclass.NewController(kubeClient, recorder, subnetProvider, securityGroupProvider, amiProvider, instanceProfileProvider, launchTemplateProvider), + nodeclass.NewController(kubeClient, recorder, subnetProvider, securityGroupProvider, amiProvider, instanceProfileProvider, launchTemplateProvider, ec2api), nodeclaimgarbagecollection.NewController(kubeClient, cloudProvider), nodeclaimtagging.NewController(kubeClient, cloudProvider, instanceProvider), controllerspricing.NewController(pricingProvider), diff --git a/pkg/controllers/nodeclass/controller.go b/pkg/controllers/nodeclass/controller.go index 48bd6b5f96c7..ff56928d7f94 100644 --- a/pkg/controllers/nodeclass/controller.go +++ b/pkg/controllers/nodeclass/controller.go @@ -44,6 +44,7 @@ import ( "sigs.k8s.io/karpenter/pkg/events" v1 "github.com/aws/karpenter-provider-aws/pkg/apis/v1" + sdk "github.com/aws/karpenter-provider-aws/pkg/aws" "github.com/aws/karpenter-provider-aws/pkg/providers/amifamily" "github.com/aws/karpenter-provider-aws/pkg/providers/instanceprofile" "github.com/aws/karpenter-provider-aws/pkg/providers/launchtemplate" @@ -69,7 +70,7 @@ type Controller struct { } func NewController(kubeClient client.Client, recorder events.Recorder, subnetProvider subnet.Provider, securityGroupProvider securitygroup.Provider, - amiProvider amifamily.Provider, instanceProfileProvider instanceprofile.Provider, launchTemplateProvider launchtemplate.Provider) *Controller { + amiProvider amifamily.Provider, instanceProfileProvider instanceprofile.Provider, launchTemplateProvider launchtemplate.Provider, ec2api sdk.EC2API) *Controller { return &Controller{ kubeClient: kubeClient, @@ -79,7 +80,7 @@ func NewController(kubeClient client.Client, recorder events.Recorder, subnetPro subnet: &Subnet{subnetProvider: subnetProvider}, securityGroup: &SecurityGroup{securityGroupProvider: securityGroupProvider}, instanceProfile: &InstanceProfile{instanceProfileProvider: instanceProfileProvider}, - validation: &Validation{}, + validation: &Validation{ec2api: ec2api, amiProvider: amiProvider}, readiness: &Readiness{launchTemplateProvider: launchTemplateProvider}, } } diff --git a/pkg/controllers/nodeclass/readiness_test.go b/pkg/controllers/nodeclass/readiness_test.go index 563f50a355ab..fdd5f3f95010 100644 --- a/pkg/controllers/nodeclass/readiness_test.go +++ b/pkg/controllers/nodeclass/readiness_test.go @@ -67,6 +67,6 @@ var _ = Describe("NodeClass Status Condition Controller", func() { nodeClass = ExpectExists(ctx, env.Client, nodeClass) Expect(nodeClass.StatusConditions().Get(status.ConditionReady).IsFalse()).To(BeTrue()) - Expect(nodeClass.StatusConditions().Get(status.ConditionReady).Message).To(Equal("SecurityGroupsReady=False")) + Expect(nodeClass.StatusConditions().Get(status.ConditionReady).Message).To(Equal("ValidationSucceeded=False, SecurityGroupsReady=False")) }) }) diff --git a/pkg/controllers/nodeclass/suite_test.go b/pkg/controllers/nodeclass/suite_test.go index 13e96713ced4..2a7e8813db9b 100644 --- a/pkg/controllers/nodeclass/suite_test.go +++ b/pkg/controllers/nodeclass/suite_test.go @@ -73,6 +73,7 @@ var _ = BeforeSuite(func() { awsEnv.AMIProvider, awsEnv.InstanceProfileProvider, awsEnv.LaunchTemplateProvider, + awsEnv.EC2API, ) }) diff --git a/pkg/controllers/nodeclass/validation.go b/pkg/controllers/nodeclass/validation.go index a4a0cce16466..8131b66047dc 100644 --- a/pkg/controllers/nodeclass/validation.go +++ b/pkg/controllers/nodeclass/validation.go @@ -22,12 +22,32 @@ import ( "sigs.k8s.io/controller-runtime/pkg/reconcile" + "github.com/aws/aws-sdk-go-v2/aws" + "github.com/aws/aws-sdk-go-v2/service/ec2" + ec2types "github.com/aws/aws-sdk-go-v2/service/ec2/types" + corev1 "k8s.io/api/core/v1" + karpv1 "sigs.k8s.io/karpenter/pkg/apis/v1" + "sigs.k8s.io/karpenter/pkg/scheduling" + v1 "github.com/aws/karpenter-provider-aws/pkg/apis/v1" + sdk "github.com/aws/karpenter-provider-aws/pkg/aws" + awserrors "github.com/aws/karpenter-provider-aws/pkg/errors" + "github.com/aws/karpenter-provider-aws/pkg/operator/options" + "github.com/aws/karpenter-provider-aws/pkg/providers/amifamily" + "github.com/aws/karpenter-provider-aws/pkg/providers/instance" + "github.com/aws/karpenter-provider-aws/pkg/providers/launchtemplate" + "github.com/aws/karpenter-provider-aws/pkg/utils" ) -type Validation struct{} +type Validation struct { + ec2api sdk.EC2API + + amiProvider amifamily.Provider +} +// nolint:gocyclo func (n Validation) Reconcile(ctx context.Context, nodeClass *v1.EC2NodeClass) (reconcile.Result, error) { + // Tag Validation if offendingTag, found := lo.FindKeyBy(nodeClass.Spec.Tags, func(k string, v string) bool { for _, exp := range v1.RestrictedTagPatterns { if exp.MatchString(k) { @@ -40,6 +60,142 @@ func (n Validation) Reconcile(ctx context.Context, nodeClass *v1.EC2NodeClass) ( fmt.Sprintf("%q tag does not pass tag validation requirements", offendingTag)) return reconcile.Result{}, reconcile.TerminalError(fmt.Errorf("%q tag does not pass tag validation requirements", offendingTag)) } + // Auth Validation + if !nodeClass.StatusConditions().Get(v1.ConditionTypeSecurityGroupsReady).IsTrue() || !nodeClass.StatusConditions().Get(v1.ConditionTypeAMIsReady).IsTrue() || !nodeClass.StatusConditions().Get(v1.ConditionTypeInstanceProfileReady).IsTrue() || !nodeClass.StatusConditions().Get(v1.ConditionTypeSubnetsReady).IsTrue() { + nodeClass.StatusConditions().SetFalse(v1.ConditionTypeValidationSucceeded, "DependenciesNotReady", "Waiting for SecurityGroups, AMIs, Subnets and InstanceProfiles to go true") + + return reconcile.Result{}, nil + } + nodeClaim := &karpv1.NodeClaim{ + Spec: karpv1.NodeClaimSpec{ + NodeClassRef: &karpv1.NodeClassReference{ + Name: nodeClass.ObjectMeta.Name, + }, + }, + } + tags, err := utils.GetTags(nodeClass, nodeClaim, options.FromContext(ctx).ClusterName) + if err != nil { + return reconcile.Result{}, fmt.Errorf("getting tags, %w", err) + } + + createFleetInput := instance.GetCreateFleetInput(nodeClass, string(karpv1.CapacityTypeOnDemand), tags, mockLaunchTemplateConfig()) + createFleetInput.DryRun = aws.Bool(true) + + if _, err := n.ec2api.CreateFleet(ctx, createFleetInput); awserrors.IgnoreDryRunError(err) != nil { + if awserrors.IgnoreUnauthorizedOperationError(err) != nil { + // Dry run should only ever return UnauthorizedOperation or DryRunOperation so if we receive any other error + // it would be an unexpected state + return reconcile.Result{}, fmt.Errorf("unexpected error during CreateFleet validation: %w", err) + } + nodeClass.StatusConditions().SetFalse(v1.ConditionTypeValidationSucceeded, "CreateFleetAuthCheckFailed", "Controller isn't authorized to call CreateFleet") + return reconcile.Result{}, nil + } + + createLaunchTemplateInput := launchtemplate.GetCreateLaunchTemplateInput(mockOptions(*nodeClaim, nodeClass, tags), corev1.IPv4Protocol, "") + createLaunchTemplateInput.DryRun = aws.Bool(true) + + if _, err := n.ec2api.CreateLaunchTemplate(ctx, createLaunchTemplateInput); awserrors.IgnoreDryRunError(err) != nil { + if awserrors.IgnoreUnauthorizedOperationError(err) != nil { + // Dry run should only ever return UnauthorizedOperation or DryRunOperation so if we receive any other error + // it would be an unexpected state + return reconcile.Result{}, fmt.Errorf("unexpected error during CreateLaunchTemplate validation: %w", err) + } + nodeClass.StatusConditions().SetFalse(v1.ConditionTypeValidationSucceeded, "CreateLaunchTemplateAuthCheckFailed", "Controller isn't authorized to call CreateLaunchTemplate") + return reconcile.Result{}, nil + } + + // This should never occur as AMIs should already be resolved during the AMI resolution phase + if len(nodeClass.Status.AMIs) == 0 { + return reconcile.Result{}, fmt.Errorf("no resolved AMIs in status: %w", err) + } + + var instanceType ec2types.InstanceType + requirements := scheduling.NewNodeSelectorRequirements(nodeClass.Status.AMIs[0].Requirements...) + + if requirements.Get(corev1.LabelArchStable).Has(karpv1.ArchitectureAmd64) { + instanceType = ec2types.InstanceTypeM5Large + } else if requirements.Get(corev1.LabelArchStable).Has(karpv1.ArchitectureArm64) { + instanceType = ec2types.InstanceTypeM6gLarge + } + + runInstancesInput := &ec2.RunInstancesInput{ + DryRun: lo.ToPtr(true), + MaxCount: aws.Int32(1), + MinCount: aws.Int32(1), + InstanceType: instanceType, + MetadataOptions: &ec2types.InstanceMetadataOptionsRequest{ + HttpEndpoint: ec2types.InstanceMetadataEndpointState(lo.FromPtr(nodeClass.Spec.MetadataOptions.HTTPEndpoint)), + HttpTokens: ec2types.HttpTokensState(lo.FromPtr(nodeClass.Spec.MetadataOptions.HTTPTokens)), + HttpProtocolIpv6: ec2types.InstanceMetadataProtocolState(lo.FromPtr(nodeClass.Spec.MetadataOptions.HTTPProtocolIPv6)), + //aws sdk v2 changed this type to *int32 instead of *int64 + //nolint: gosec + HttpPutResponseHopLimit: aws.Int32(int32(lo.FromPtr(nodeClass.Spec.MetadataOptions.HTTPPutResponseHopLimit))), + }, + TagSpecifications: []ec2types.TagSpecification{ + { + ResourceType: ec2types.ResourceTypeInstance, + Tags: utils.MergeTags(tags), + }, + { + ResourceType: ec2types.ResourceTypeVolume, + Tags: utils.MergeTags(tags), + }, + { + ResourceType: ec2types.ResourceTypeNetworkInterface, + Tags: utils.MergeTags(tags), + }, + }, + ImageId: lo.ToPtr(nodeClass.Status.AMIs[0].ID), + } + + if _, err = n.ec2api.RunInstances(ctx, runInstancesInput); awserrors.IgnoreDryRunError(err) != nil { + if awserrors.IgnoreUnauthorizedOperationError(err) != nil { + // Dry run should only ever return UnauthorizedOperation or DryRunOperation so if we receive any other error + // it would be an unexpected state + return reconcile.Result{}, fmt.Errorf("unexpected error during RunInstances validation: %w", err) + } + nodeClass.StatusConditions().SetFalse(v1.ConditionTypeValidationSucceeded, "RunInstancesAuthCheckFailed", "Controller isn't authorized to call RunInstances") + return reconcile.Result{}, nil + } nodeClass.StatusConditions().SetTrue(v1.ConditionTypeValidationSucceeded) return reconcile.Result{}, nil } + +func mockLaunchTemplateConfig() []ec2types.FleetLaunchTemplateConfigRequest { + return []ec2types.FleetLaunchTemplateConfigRequest{ + { + LaunchTemplateSpecification: &ec2types.FleetLaunchTemplateSpecificationRequest{ + LaunchTemplateName: aws.String("mock-lt-name"), + LaunchTemplateId: aws.String("lt-1234567890abcdef0"), + Version: aws.String("1"), + }, + Overrides: []ec2types.FleetLaunchTemplateOverridesRequest{ + { + InstanceType: ec2types.InstanceTypeT3Micro, + SubnetId: aws.String("subnet-1234567890abcdef0"), + }, + { + InstanceType: ec2types.InstanceTypeT3Small, + SubnetId: aws.String("subnet-1234567890abcdef1"), + }, + }, + }, + } +} +func mockOptions(nodeClaim karpv1.NodeClaim, nodeClass *v1.EC2NodeClass, tags map[string]string) *amifamily.LaunchTemplate { + return &amifamily.LaunchTemplate{ + Options: &amifamily.Options{ + Tags: tags, + InstanceProfile: nodeClass.Status.InstanceProfile, + SecurityGroups: nodeClass.Status.SecurityGroups, + }, + MetadataOptions: &v1.MetadataOptions{ + HTTPEndpoint: nodeClass.Spec.MetadataOptions.HTTPEndpoint, + HTTPTokens: nodeClass.Spec.MetadataOptions.HTTPTokens, + HTTPProtocolIPv6: nodeClass.Spec.MetadataOptions.HTTPProtocolIPv6, + HTTPPutResponseHopLimit: nodeClass.Spec.MetadataOptions.HTTPPutResponseHopLimit, + }, + AMIID: nodeClaim.Status.ImageID, + BlockDeviceMappings: nodeClass.Spec.BlockDeviceMappings, + } +} diff --git a/pkg/controllers/nodeclass/validation_test.go b/pkg/controllers/nodeclass/validation_test.go index 24cd4e5ee644..4cfcb16ab3f3 100644 --- a/pkg/controllers/nodeclass/validation_test.go +++ b/pkg/controllers/nodeclass/validation_test.go @@ -18,7 +18,10 @@ import ( status "github.com/awslabs/operatorpkg/status" "github.com/samber/lo" + "github.com/aws/smithy-go" + v1 "github.com/aws/karpenter-provider-aws/pkg/apis/v1" + "github.com/aws/karpenter-provider-aws/pkg/fake" "github.com/aws/karpenter-provider-aws/pkg/test" . "github.com/onsi/ginkgo/v2" @@ -27,55 +30,92 @@ import ( ) var _ = Describe("NodeClass Validation Status Controller", func() { - BeforeEach(func() { - nodeClass = test.EC2NodeClass(v1.EC2NodeClass{ - Spec: v1.EC2NodeClassSpec{ - SubnetSelectorTerms: []v1.SubnetSelectorTerm{ - { - Tags: map[string]string{"*": "*"}, + Context("Tag Validation", func() { + BeforeEach(func() { + nodeClass = test.EC2NodeClass(v1.EC2NodeClass{ + Spec: v1.EC2NodeClassSpec{ + SubnetSelectorTerms: []v1.SubnetSelectorTerm{ + { + Tags: map[string]string{"*": "*"}, + }, }, - }, - SecurityGroupSelectorTerms: []v1.SecurityGroupSelectorTerm{ - { - Tags: map[string]string{"*": "*"}, + SecurityGroupSelectorTerms: []v1.SecurityGroupSelectorTerm{ + { + Tags: map[string]string{"*": "*"}, + }, }, - }, - AMIFamily: lo.ToPtr(v1.AMIFamilyCustom), - AMISelectorTerms: []v1.AMISelectorTerm{ - { - Tags: map[string]string{"*": "*"}, + AMIFamily: lo.ToPtr(v1.AMIFamilyCustom), + AMISelectorTerms: []v1.AMISelectorTerm{ + { + Tags: map[string]string{"*": "*"}, + }, + }, + Tags: map[string]string{ + "kubernetes.io/cluster/anothercluster": "owned", }, }, - Tags: map[string]string{ - "kubernetes.io/cluster/anothercluster": "owned", - }, - }, + }) + }) + DescribeTable("should update status condition on nodeClass as NotReady when tag validation fails", func(illegalTag map[string]string) { + nodeClass.Spec.Tags = illegalTag + ExpectApplied(ctx, env.Client, nodeClass) + err := ExpectObjectReconcileFailed(ctx, env.Client, controller, nodeClass) + Expect(err).To(HaveOccurred()) + nodeClass = ExpectExists(ctx, env.Client, nodeClass) + Expect(nodeClass.StatusConditions().Get(v1.ConditionTypeValidationSucceeded).IsFalse()).To(BeTrue()) + Expect(nodeClass.StatusConditions().Get(status.ConditionReady).IsFalse()).To(BeTrue()) + Expect(nodeClass.StatusConditions().Get(status.ConditionReady).Message).To(Equal("ValidationSucceeded=False")) + }, + Entry("kubernetes.io/cluster*", map[string]string{"kubernetes.io/cluster/acluster": "owned"}), + Entry(v1.NodePoolTagKey, map[string]string{v1.NodePoolTagKey: "testnodepool"}), + Entry(v1.EKSClusterNameTagKey, map[string]string{v1.EKSClusterNameTagKey: "acluster"}), + Entry(v1.NodeClassTagKey, map[string]string{v1.NodeClassTagKey: "testnodeclass"}), + Entry(v1.NodeClaimTagKey, map[string]string{v1.NodeClaimTagKey: "testnodeclaim"}), + ) + It("should update status condition as Ready when tags are valid", func() { + nodeClass.Spec.Tags = map[string]string{} + ExpectApplied(ctx, env.Client, nodeClass) + ExpectObjectReconciled(ctx, env.Client, controller, nodeClass) + nodeClass = ExpectExists(ctx, env.Client, nodeClass) + + Expect(nodeClass.StatusConditions().Get(v1.ConditionTypeValidationSucceeded).IsTrue()).To(BeTrue()) + Expect(nodeClass.StatusConditions().Get(status.ConditionReady).IsTrue()).To(BeTrue()) }) }) - DescribeTable("should update status condition on nodeClass as NotReady when tag validation fails", func(illegalTag map[string]string) { - nodeClass.Spec.Tags = illegalTag - ExpectApplied(ctx, env.Client, nodeClass) - err := ExpectObjectReconcileFailed(ctx, env.Client, controller, nodeClass) - Expect(err).To(HaveOccurred()) - nodeClass = ExpectExists(ctx, env.Client, nodeClass) - Expect(nodeClass.Status.Conditions).To(HaveLen(6)) - Expect(nodeClass.StatusConditions().Get(v1.ConditionTypeValidationSucceeded).IsFalse()).To(BeTrue()) - Expect(nodeClass.StatusConditions().Get(status.ConditionReady).IsFalse()).To(BeTrue()) - Expect(nodeClass.StatusConditions().Get(status.ConditionReady).Message).To(Equal("ValidationSucceeded=False")) - }, - Entry("kubernetes.io/cluster*", map[string]string{"kubernetes.io/cluster/acluster": "owned"}), - Entry(v1.NodePoolTagKey, map[string]string{v1.NodePoolTagKey: "testnodepool"}), - Entry(v1.EKSClusterNameTagKey, map[string]string{v1.EKSClusterNameTagKey: "acluster"}), - Entry(v1.NodeClassTagKey, map[string]string{v1.NodeClassTagKey: "testnodeclass"}), - Entry(v1.NodeClaimTagKey, map[string]string{v1.NodeClaimTagKey: "testnodeclaim"}), - ) - It("should update status condition as Ready when tags are valid", func() { - nodeClass.Spec.Tags = map[string]string{} - ExpectApplied(ctx, env.Client, nodeClass) - ExpectObjectReconciled(ctx, env.Client, controller, nodeClass) - nodeClass = ExpectExists(ctx, env.Client, nodeClass) + Context("Authorization Validation", func() { + DescribeTable("NodeClass validation failure conditions", + func(setupFn func()) { + ExpectApplied(ctx, env.Client, nodeClass) + setupFn() + ExpectObjectReconciled(ctx, env.Client, controller, nodeClass) + nodeClass = ExpectExists(ctx, env.Client, nodeClass) + Expect(nodeClass.StatusConditions().Get(v1.ConditionTypeValidationSucceeded).IsFalse()).To(BeTrue()) + }, + Entry("should update status condition as NotReady when CreateFleet unauthorized", + func() { + awsEnv.EC2API.CreateFleetBehavior.Error.Set(&smithy.GenericAPIError{ + Code: "UnauthorizedOperation", + }, fake.MaxCalls(1)) + }), + Entry("should update status condition as NotReady when RunInstances unauthorized", + func() { + awsEnv.EC2API.RunInstancesBehavior.Error.Set(&smithy.GenericAPIError{ + Code: "UnauthorizedOperation", + }, fake.MaxCalls(1)) + }), + Entry("should update status condition as NotReady when CreateLaunchTemplate unauthorized", + func() { + awsEnv.EC2API.CreateLaunchTemplateBehavior.Error.Set(&smithy.GenericAPIError{ + Code: "UnauthorizedOperation", + }, fake.MaxCalls(1)) + }), + ) - Expect(nodeClass.StatusConditions().Get(v1.ConditionTypeValidationSucceeded).IsTrue()).To(BeTrue()) - Expect(nodeClass.StatusConditions().Get(status.ConditionReady).IsTrue()).To(BeTrue()) + It("should update status condition as Ready when authorized", func() { + ExpectApplied(ctx, env.Client, nodeClass) + ExpectObjectReconciled(ctx, env.Client, controller, nodeClass) + nodeClass = ExpectExists(ctx, env.Client, nodeClass) + Expect(nodeClass.StatusConditions().Get(v1.ConditionTypeValidationSucceeded).IsTrue()).To(BeTrue()) + }) }) }) diff --git a/pkg/errors/errors.go b/pkg/errors/errors.go index a77e8e1c250e..d3dfe9e1c8ea 100644 --- a/pkg/errors/errors.go +++ b/pkg/errors/errors.go @@ -25,6 +25,8 @@ import ( const ( launchTemplateNameNotFoundCode = "InvalidLaunchTemplateName.NotFoundException" + DryRunOperationErrorCode = "DryRunOperation" + UnauthorizedOperationErrorCode = "UnauthorizedOperation" ) var ( @@ -90,6 +92,42 @@ func IgnoreAlreadyExists(err error) error { return err } +func IsDryRunError(err error) bool { + if err == nil { + return false + } + var apiErr smithy.APIError + if errors.As(err, &apiErr) { + return apiErr.ErrorCode() == DryRunOperationErrorCode + } + return false +} + +func IgnoreDryRunError(err error) error { + if IsDryRunError(err) { + return nil + } + return err +} + +func IsUnauthorizedOperationError(err error) bool { + if err == nil { + return false + } + var apiErr smithy.APIError + if errors.As(err, &apiErr) { + return apiErr.ErrorCode() == UnauthorizedOperationErrorCode + } + return false +} + +func IgnoreUnauthorizedOperationError(err error) error { + if IsUnauthorizedOperationError(err) { + return nil + } + return err +} + // IsUnfulfillableCapacity returns true if the Fleet err means // capacity is temporarily unavailable for launching. // This could be due to account limits, insufficient ec2 capacity, etc. diff --git a/pkg/fake/ec2api.go b/pkg/fake/ec2api.go index 8c770f9039b0..c6f6556de0b0 100644 --- a/pkg/fake/ec2api.go +++ b/pkg/fake/ec2api.go @@ -59,7 +59,8 @@ type EC2Behavior struct { TerminateInstancesBehavior MockedFunction[ec2.TerminateInstancesInput, ec2.TerminateInstancesOutput] DescribeInstancesBehavior MockedFunction[ec2.DescribeInstancesInput, ec2.DescribeInstancesOutput] CreateTagsBehavior MockedFunction[ec2.CreateTagsInput, ec2.CreateTagsOutput] - CalledWithCreateLaunchTemplateInput AtomicPtrSlice[ec2.CreateLaunchTemplateInput] + RunInstancesBehavior MockedFunction[ec2.RunInstancesInput, ec2.RunInstancesOutput] + CreateLaunchTemplateBehavior MockedFunction[ec2.CreateLaunchTemplateInput, ec2.CreateLaunchTemplateOutput] CalledWithDescribeImagesInput AtomicPtrSlice[ec2.DescribeImagesInput] Instances sync.Map LaunchTemplates sync.Map @@ -92,7 +93,7 @@ func (e *EC2API) Reset() { e.CreateFleetBehavior.Reset() e.TerminateInstancesBehavior.Reset() e.DescribeInstancesBehavior.Reset() - e.CalledWithCreateLaunchTemplateInput.Reset() + e.CreateLaunchTemplateBehavior.Reset() e.CalledWithDescribeImagesInput.Reset() e.DescribeSpotPriceHistoryInput.Reset() e.DescribeSpotPriceHistoryOutput.Reset() @@ -110,6 +111,16 @@ func (e *EC2API) Reset() { // nolint: gocyclo func (e *EC2API) CreateFleet(_ context.Context, input *ec2.CreateFleetInput, _ ...func(*ec2.Options)) (*ec2.CreateFleetOutput, error) { + if input.DryRun != nil && *input.DryRun { + err := e.CreateFleetBehavior.Error.Get() + if err == nil { + return &ec2.CreateFleetOutput{}, &smithy.GenericAPIError{ + Code: "DryRunOperation", + Message: "Request would have succeeded, but DryRun flag is set", + } + } + return nil, err + } return e.CreateFleetBehavior.Invoke(input, func(input *ec2.CreateFleetInput) (*ec2.CreateFleetOutput, error) { if input.LaunchTemplateConfigs[0].LaunchTemplateSpecification.LaunchTemplateName == nil { return nil, fmt.Errorf("missing launch template name") @@ -140,10 +151,10 @@ func (e *EC2API) CreateFleet(_ context.Context, input *ec2.CreateFleetInput, _ . continue } amiID := aws.String("") - if e.CalledWithCreateLaunchTemplateInput.Len() > 0 { - lt := e.CalledWithCreateLaunchTemplateInput.Pop() + if e.CreateLaunchTemplateBehavior.CalledWithInput.Len() > 0 { + lt := e.CreateLaunchTemplateBehavior.CalledWithInput.Pop() amiID = lt.LaunchTemplateData.ImageId - e.CalledWithCreateLaunchTemplateInput.Add(lt) + e.CreateLaunchTemplateBehavior.CalledWithInput.Add(lt) } instanceState := ec2types.InstanceStateNameRunning for ; fulfilled < int(*input.TargetCapacitySpecification.TotalTargetCapacity); fulfilled++ { @@ -212,15 +223,27 @@ func (e *EC2API) TerminateInstances(_ context.Context, input *ec2.TerminateInsta }) } -func (e *EC2API) CreateLaunchTemplate(_ context.Context, input *ec2.CreateLaunchTemplateInput, _ ...func(*ec2.Options)) (*ec2.CreateLaunchTemplateOutput, error) { - if !e.NextError.IsNil() { - defer e.NextError.Reset() - return nil, e.NextError.Get() +// Then modify the CreateLaunchTemplate method: +func (e *EC2API) CreateLaunchTemplate(ctx context.Context, input *ec2.CreateLaunchTemplateInput, _ ...func(*ec2.Options)) (*ec2.CreateLaunchTemplateOutput, error) { + if input.DryRun != nil && *input.DryRun { + err := e.CreateLaunchTemplateBehavior.Error.Get() + if err == nil { + return &ec2.CreateLaunchTemplateOutput{}, &smithy.GenericAPIError{ + Code: "DryRunOperation", + Message: "Request would have succeeded, but DryRun flag is set", + } + } + return nil, err } - e.CalledWithCreateLaunchTemplateInput.Add(input) - launchTemplate := ec2types.LaunchTemplate{LaunchTemplateName: input.LaunchTemplateName} - e.LaunchTemplates.Store(input.LaunchTemplateName, launchTemplate) - return &ec2.CreateLaunchTemplateOutput{LaunchTemplate: lo.ToPtr(launchTemplate)}, nil + return e.CreateLaunchTemplateBehavior.Invoke(input, func(input *ec2.CreateLaunchTemplateInput) (*ec2.CreateLaunchTemplateOutput, error) { + if !e.NextError.IsNil() { + defer e.NextError.Reset() + return nil, e.NextError.Get() + } + launchTemplate := ec2types.LaunchTemplate{LaunchTemplateName: input.LaunchTemplateName} + e.LaunchTemplates.Store(input.LaunchTemplateName, launchTemplate) + return &ec2.CreateLaunchTemplateOutput{LaunchTemplate: lo.ToPtr(launchTemplate)}, nil + }) } func (e *EC2API) CreateTags(_ context.Context, input *ec2.CreateTagsInput, _ ...func(*ec2.Options)) (*ec2.CreateTagsOutput, error) { @@ -317,7 +340,7 @@ func filterInstances(instances []ec2types.Instance, filters []ec2types.Filter) [ return ret } -func (e *EC2API) DescribeImages(_ context.Context, input *ec2.DescribeImagesInput, _ ...func(*ec2.Options)) (*ec2.DescribeImagesOutput, error) { +func (e *EC2API) DescribeImages(ctx context.Context, input *ec2.DescribeImagesInput, _ ...func(*ec2.Options)) (*ec2.DescribeImagesOutput, error) { if !e.NextError.IsNil() { defer e.NextError.Reset() return nil, e.NextError.Get() @@ -325,10 +348,11 @@ func (e *EC2API) DescribeImages(_ context.Context, input *ec2.DescribeImagesInpu e.CalledWithDescribeImagesInput.Add(input) if !e.DescribeImagesOutput.IsNil() { describeImagesOutput := e.DescribeImagesOutput.Clone() + describeImagesOutput.Images = FilterDescribeImages(describeImagesOutput.Images, input.Filters) return describeImagesOutput, nil } - if input.Filters[0].Values[0] == "invalid" { + if input.Filters != nil && input.Filters[0].Values[0] == "invalid" { return &ec2.DescribeImagesOutput{}, nil } return &ec2.DescribeImagesOutput{ @@ -355,7 +379,7 @@ func (e *EC2API) DescribeLaunchTemplates(_ context.Context, input *ec2.DescribeL output := &ec2.DescribeLaunchTemplatesOutput{} e.LaunchTemplates.Range(func(key, value interface{}) bool { launchTemplate := value.(ec2types.LaunchTemplate) - if lo.Contains(aws.StringSlice(input.LaunchTemplateNames), launchTemplate.LaunchTemplateName) || len(input.Filters) != 0 && Filter(input.Filters, aws.ToString(launchTemplate.LaunchTemplateId), aws.ToString(launchTemplate.LaunchTemplateName), launchTemplate.Tags) { + if lo.Contains(input.LaunchTemplateNames, lo.FromPtr(launchTemplate.LaunchTemplateName)) || len(input.Filters) != 0 && Filter(input.Filters, aws.ToString(launchTemplate.LaunchTemplateId), aws.ToString(launchTemplate.LaunchTemplateName), launchTemplate.Tags) { output.LaunchTemplates = append(output.LaunchTemplates, launchTemplate) } return true @@ -654,3 +678,34 @@ func (e *EC2API) DescribeSpotPriceHistory(_ context.Context, input *ec2.Describe // fail if the test doesn't provide specific data which causes our pricing provider to use its static price list return nil, errors.New("no pricing data provided") } + +func (e *EC2API) RunInstances(ctx context.Context, input *ec2.RunInstancesInput, optFns ...func(*ec2.Options)) (*ec2.RunInstancesOutput, error) { + if input.DryRun != nil && *input.DryRun { + err := e.RunInstancesBehavior.Error.Get() + if err == nil { + return &ec2.RunInstancesOutput{}, &smithy.GenericAPIError{ + Code: "DryRunOperation", + Message: "Request would have succeeded, but DryRun flag is set", + } + } + return nil, err + } + return e.RunInstancesBehavior.Invoke(input, func(input *ec2.RunInstancesInput) (*ec2.RunInstancesOutput, error) { + if !e.NextError.IsNil() { + defer e.NextError.Reset() + return nil, e.NextError.Get() + } + + // Default implementation + instance := ec2types.Instance{ + InstanceId: aws.String(test.RandomName()), + InstanceType: input.InstanceType, + State: &ec2types.InstanceState{Name: ec2types.InstanceStateNameRunning}, + // Add other required fields + } + + return &ec2.RunInstancesOutput{ + Instances: []ec2types.Instance{instance}, + }, nil + }) +} diff --git a/pkg/fake/types.go b/pkg/fake/types.go index 88fe2ca83bc9..5f235ac86984 100644 --- a/pkg/fake/types.go +++ b/pkg/fake/types.go @@ -44,6 +44,7 @@ func (m *MockedFunction[I, O]) Invoke(input *I, defaultTransformer func(*I) (*O, m.failedCalls.Add(1) return nil, err } + m.CalledWithInput.Add(input) if !m.Output.IsNil() { diff --git a/pkg/operator/operator.go b/pkg/operator/operator.go index b86d73cb2dda..abbb0a56d365 100644 --- a/pkg/operator/operator.go +++ b/pkg/operator/operator.go @@ -91,6 +91,7 @@ type Operator struct { InstanceTypesProvider *instancetype.DefaultProvider InstanceProvider instance.Provider SSMProvider ssmp.Provider + EC2API *ec2.Client } func NewOperator(ctx context.Context, operator *operator.Operator) (context.Context, *Operator) { @@ -207,6 +208,7 @@ func NewOperator(ctx context.Context, operator *operator.Operator) (context.Cont InstanceTypesProvider: instanceTypeProvider, InstanceProvider: instanceProvider, SSMProvider: ssmProvider, + EC2API: ec2api, } } diff --git a/pkg/providers/instance/instance.go b/pkg/providers/instance/instance.go index 69f8c8e8fa16..74c0a6c11bec 100644 --- a/pkg/providers/instance/instance.go +++ b/pkg/providers/instance/instance.go @@ -23,6 +23,7 @@ import ( "strings" sdk "github.com/aws/karpenter-provider-aws/pkg/aws" + "github.com/aws/karpenter-provider-aws/pkg/utils" "github.com/aws/aws-sdk-go-v2/aws" awshttp "github.com/aws/aws-sdk-go-v2/aws/transport/http" @@ -44,7 +45,6 @@ import ( "github.com/aws/karpenter-provider-aws/pkg/operator/options" "github.com/aws/karpenter-provider-aws/pkg/providers/launchtemplate" "github.com/aws/karpenter-provider-aws/pkg/providers/subnet" - "github.com/aws/karpenter-provider-aws/pkg/utils" "sigs.k8s.io/karpenter/pkg/cloudprovider" "sigs.k8s.io/karpenter/pkg/scheduling" @@ -226,20 +226,7 @@ func (p *DefaultProvider) launchInstance(ctx context.Context, nodeClass *v1.EC2N log.FromContext(ctx).Error(err, "failed while checking on-demand fallback") } // Create fleet - createFleetInput := &ec2.CreateFleetInput{ - Type: ec2types.FleetTypeInstant, - Context: nodeClass.Spec.Context, - LaunchTemplateConfigs: launchTemplateConfigs, - TargetCapacitySpecification: &ec2types.TargetCapacitySpecificationRequest{ - DefaultTargetCapacityType: ec2types.DefaultTargetCapacityType(capacityType), - TotalTargetCapacity: aws.Int32(1), - }, - TagSpecifications: []ec2types.TagSpecification{ - {ResourceType: ec2types.ResourceTypeInstance, Tags: utils.MergeTags(tags)}, - {ResourceType: ec2types.ResourceTypeVolume, Tags: utils.MergeTags(tags)}, - {ResourceType: ec2types.ResourceTypeFleet, Tags: utils.MergeTags(tags)}, - }, - } + createFleetInput := GetCreateFleetInput(nodeClass, capacityType, tags, launchTemplateConfigs) if capacityType == karpv1.CapacityTypeSpot { createFleetInput.SpotOptions = &ec2types.SpotOptionsRequest{AllocationStrategy: ec2types.SpotAllocationStrategyPriceCapacityOptimized} } else { @@ -269,6 +256,23 @@ func (p *DefaultProvider) launchInstance(ctx context.Context, nodeClass *v1.EC2N return createFleetOutput.Instances[0], nil } +func GetCreateFleetInput(nodeClass *v1.EC2NodeClass, capacityType string, tags map[string]string, launchTemplateConfigs []ec2types.FleetLaunchTemplateConfigRequest) *ec2.CreateFleetInput { + return &ec2.CreateFleetInput{ + Type: ec2types.FleetTypeInstant, + Context: nodeClass.Spec.Context, + LaunchTemplateConfigs: launchTemplateConfigs, + TargetCapacitySpecification: &ec2types.TargetCapacitySpecificationRequest{ + DefaultTargetCapacityType: ec2types.DefaultTargetCapacityType(capacityType), + TotalTargetCapacity: aws.Int32(1), + }, + TagSpecifications: []ec2types.TagSpecification{ + {ResourceType: ec2types.ResourceTypeInstance, Tags: utils.MergeTags(tags)}, + {ResourceType: ec2types.ResourceTypeVolume, Tags: utils.MergeTags(tags)}, + {ResourceType: ec2types.ResourceTypeFleet, Tags: utils.MergeTags(tags)}, + }, + } +} + func (p *DefaultProvider) checkODFallback(nodeClaim *karpv1.NodeClaim, instanceTypes []*cloudprovider.InstanceType, launchTemplateConfigs []ec2types.FleetLaunchTemplateConfigRequest) error { // only evaluate for on-demand fallback if the capacity type for the request is OD and both OD and spot are allowed in requirements if p.getCapacityType(nodeClaim, instanceTypes) != karpv1.CapacityTypeOnDemand || !scheduling.NewNodeSelectorRequirementsWithMinValues(nodeClaim.Spec.Requirements...).Get(karpv1.CapacityTypeLabelKey).Has(karpv1.CapacityTypeSpot) { diff --git a/pkg/providers/instancetype/suite_test.go b/pkg/providers/instancetype/suite_test.go index 6505d0c6ec3f..3c5fb0992ce5 100644 --- a/pkg/providers/instancetype/suite_test.go +++ b/pkg/providers/instancetype/suite_test.go @@ -2233,8 +2233,8 @@ var _ = Describe("InstanceTypeProvider", func() { ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) node := ExpectScheduled(ctx, env.Client, pod) Expect(*node.Status.Capacity.StorageEphemeral()).To(Equal(resource.MustParse("20Gi"))) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically(">=", 1)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically(">=", 1)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { Expect(ltInput.LaunchTemplateData.BlockDeviceMappings).To(HaveLen(1)) Expect(*ltInput.LaunchTemplateData.BlockDeviceMappings[0].DeviceName).To(Equal("/dev/xvda")) Expect(*ltInput.LaunchTemplateData.BlockDeviceMappings[0].Ebs.SnapshotId).To(Equal("snap-xxxxxxxx")) @@ -2246,8 +2246,8 @@ var _ = Describe("InstanceTypeProvider", func() { ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) node := ExpectScheduled(ctx, env.Client, pod) Expect(*node.Status.Capacity.StorageEphemeral()).To(Equal(resource.MustParse("20Gi"))) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically(">=", 1)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically(">=", 1)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { Expect(ltInput.LaunchTemplateData.BlockDeviceMappings).To(HaveLen(1)) Expect(*ltInput.LaunchTemplateData.BlockDeviceMappings[0].DeviceName).To(Equal("/dev/xvda")) Expect(*ltInput.LaunchTemplateData.BlockDeviceMappings[0].Ebs.SnapshotId).To(Equal("snap-xxxxxxxx")) @@ -2262,8 +2262,8 @@ var _ = Describe("InstanceTypeProvider", func() { ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) node := ExpectScheduled(ctx, env.Client, pod) Expect(*node.Status.Capacity.StorageEphemeral()).To(Equal(resource.MustParse("20Gi"))) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically(">=", 1)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically(">=", 1)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { Expect(ltInput.LaunchTemplateData.BlockDeviceMappings).To(HaveLen(1)) Expect(*ltInput.LaunchTemplateData.BlockDeviceMappings[0].DeviceName).To(Equal("/dev/xvda")) Expect(*ltInput.LaunchTemplateData.BlockDeviceMappings[0].Ebs.SnapshotId).To(Equal("snap-xxxxxxxx")) @@ -2277,8 +2277,8 @@ var _ = Describe("InstanceTypeProvider", func() { ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) node := ExpectScheduled(ctx, env.Client, pod) Expect(*node.Status.Capacity.StorageEphemeral()).To(Equal(resource.MustParse("20Gi"))) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically(">=", 1)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically(">=", 1)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { Expect(awsEnv.EC2API.CreateFleetBehavior.CalledWithInput.Len()).To(Equal(1)) Expect(ltInput.LaunchTemplateData.BlockDeviceMappings).To(HaveLen(1)) Expect(*ltInput.LaunchTemplateData.BlockDeviceMappings[0].DeviceName).To(Equal("/dev/xvdb")) @@ -2292,8 +2292,8 @@ var _ = Describe("InstanceTypeProvider", func() { pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically(">=", 1)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically(">=", 1)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { Expect(ltInput.LaunchTemplateData.MetadataOptions.HttpEndpoint).To(Equal(ec2types.LaunchTemplateInstanceMetadataEndpointStateEnabled)) Expect(ltInput.LaunchTemplateData.MetadataOptions.HttpProtocolIpv6).To(Equal(ec2types.LaunchTemplateInstanceMetadataProtocolIpv6Disabled)) Expect(lo.FromPtr(ltInput.LaunchTemplateData.MetadataOptions.HttpPutResponseHopLimit)).To(Equal(int32(1))) @@ -2311,8 +2311,8 @@ var _ = Describe("InstanceTypeProvider", func() { pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically(">=", 1)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically(">=", 1)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { Expect(ltInput.LaunchTemplateData.MetadataOptions.HttpEndpoint).To(Equal(ec2types.LaunchTemplateInstanceMetadataEndpointStateDisabled)) Expect(ltInput.LaunchTemplateData.MetadataOptions.HttpProtocolIpv6).To(Equal(ec2types.LaunchTemplateInstanceMetadataProtocolIpv6Enabled)) Expect(lo.FromPtr(ltInput.LaunchTemplateData.MetadataOptions.HttpPutResponseHopLimit)).To(Equal(int32(1))) diff --git a/pkg/providers/launchtemplate/launchtemplate.go b/pkg/providers/launchtemplate/launchtemplate.go index c9cfa098b312..4e1fe5b9a653 100644 --- a/pkg/providers/launchtemplate/launchtemplate.go +++ b/pkg/providers/launchtemplate/launchtemplate.go @@ -219,17 +219,28 @@ func (p *DefaultProvider) createLaunchTemplate(ctx context.Context, options *ami if err != nil { return ec2types.LaunchTemplate{}, err } + createLaunchTemplateInput := GetCreateLaunchTemplateInput(options, p.ClusterIPFamily, userData) + output, err := p.ec2api.CreateLaunchTemplate(ctx, createLaunchTemplateInput) + if err != nil { + return ec2types.LaunchTemplate{}, err + } + log.FromContext(ctx).WithValues("id", aws.ToString(output.LaunchTemplate.LaunchTemplateId)).V(1).Info("created launch template") + return lo.FromPtr(output.LaunchTemplate), nil +} + +// you need UserData, AmiID, tags, blockdevicemappings, instance profile, +func GetCreateLaunchTemplateInput(options *amifamily.LaunchTemplate, ClusterIPFamily corev1.IPFamily, userData string) *ec2.CreateLaunchTemplateInput { launchTemplateDataTags := []ec2types.LaunchTemplateTagSpecificationRequest{ {ResourceType: ec2types.ResourceTypeNetworkInterface, Tags: utils.MergeTags(options.Tags)}, } if options.CapacityType == karpv1.CapacityTypeSpot { launchTemplateDataTags = append(launchTemplateDataTags, ec2types.LaunchTemplateTagSpecificationRequest{ResourceType: ec2types.ResourceTypeSpotInstancesRequest, Tags: utils.MergeTags(options.Tags)}) } - networkInterfaces := p.generateNetworkInterfaces(options) - output, err := p.ec2api.CreateLaunchTemplate(ctx, &ec2.CreateLaunchTemplateInput{ + networkInterfaces := generateNetworkInterfaces(options, ClusterIPFamily) + return &ec2.CreateLaunchTemplateInput{ LaunchTemplateName: aws.String(LaunchTemplateName(options)), LaunchTemplateData: &ec2types.RequestLaunchTemplateData{ - BlockDeviceMappings: p.blockDeviceMappings(options.BlockDeviceMappings), + BlockDeviceMappings: blockDeviceMappings(options.BlockDeviceMappings), IamInstanceProfile: &ec2types.LaunchTemplateIamInstanceProfileSpecificationRequest{ Name: aws.String(options.InstanceProfile), }, @@ -263,16 +274,11 @@ func (p *DefaultProvider) createLaunchTemplate(ctx context.Context, options *ami Tags: utils.MergeTags(options.Tags), }, }, - }) - if err != nil { - return ec2types.LaunchTemplate{}, err } - log.FromContext(ctx).WithValues("id", aws.ToString(output.LaunchTemplate.LaunchTemplateId)).V(1).Info("created launch template") - return lo.FromPtr(output.LaunchTemplate), nil } // generateNetworkInterfaces generates network interfaces for the launch template. -func (p *DefaultProvider) generateNetworkInterfaces(options *amifamily.LaunchTemplate) []ec2types.LaunchTemplateInstanceNetworkInterfaceSpecificationRequest { +func generateNetworkInterfaces(options *amifamily.LaunchTemplate, clusterIPFamily corev1.IPFamily) []ec2types.LaunchTemplateInstanceNetworkInterfaceSpecificationRequest { if options.EFACount != 0 { return lo.Times(options.EFACount, func(i int) ec2types.LaunchTemplateInstanceNetworkInterfaceSpecificationRequest { return ec2types.LaunchTemplateInstanceNetworkInterfaceSpecificationRequest{ @@ -285,8 +291,8 @@ func (p *DefaultProvider) generateNetworkInterfaces(options *amifamily.LaunchTem // Instances launched with multiple pre-configured network interfaces cannot set AssociatePublicIPAddress to true. This is an EC2 limitation. However, this does not apply for instances // with a single EFA network interface, and we should support those use cases. Launch failures with multiple enis should be considered user misconfiguration. AssociatePublicIpAddress: options.AssociatePublicIPAddress, - PrimaryIpv6: lo.Ternary(p.ClusterIPFamily == corev1.IPv6Protocol, lo.ToPtr(true), nil), - Ipv6AddressCount: lo.Ternary(p.ClusterIPFamily == corev1.IPv6Protocol, lo.ToPtr(int32(1)), nil), + PrimaryIpv6: lo.Ternary(clusterIPFamily == corev1.IPv6Protocol, lo.ToPtr(true), nil), + Ipv6AddressCount: lo.Ternary(clusterIPFamily == corev1.IPv6Protocol, lo.ToPtr(int32(1)), nil), } }) } @@ -298,13 +304,13 @@ func (p *DefaultProvider) generateNetworkInterfaces(options *amifamily.LaunchTem Groups: lo.Map(options.SecurityGroups, func(s v1.SecurityGroup, _ int) string { return s.ID }), - PrimaryIpv6: lo.Ternary(p.ClusterIPFamily == corev1.IPv6Protocol, lo.ToPtr(true), nil), - Ipv6AddressCount: lo.Ternary(p.ClusterIPFamily == corev1.IPv6Protocol, lo.ToPtr(int32(1)), nil), + PrimaryIpv6: lo.Ternary(clusterIPFamily == corev1.IPv6Protocol, lo.ToPtr(true), nil), + Ipv6AddressCount: lo.Ternary(clusterIPFamily == corev1.IPv6Protocol, lo.ToPtr(int32(1)), nil), }, } } -func (p *DefaultProvider) blockDeviceMappings(blockDeviceMappings []*v1.BlockDeviceMapping) []ec2types.LaunchTemplateBlockDeviceMappingRequest { +func blockDeviceMappings(blockDeviceMappings []*v1.BlockDeviceMapping) []ec2types.LaunchTemplateBlockDeviceMappingRequest { if len(blockDeviceMappings) == 0 { // The EC2 API fails with empty slices and expects nil. return nil @@ -324,7 +330,7 @@ func (p *DefaultProvider) blockDeviceMappings(blockDeviceMappings []*v1.BlockDev Throughput: lo.EmptyableToPtr(int32(lo.FromPtr(blockDeviceMapping.EBS.Throughput))), KmsKeyId: blockDeviceMapping.EBS.KMSKeyID, SnapshotId: blockDeviceMapping.EBS.SnapshotID, - VolumeSize: p.volumeSize(blockDeviceMapping.EBS.VolumeSize), + VolumeSize: volumeSize(blockDeviceMapping.EBS.VolumeSize), }, }) } @@ -332,7 +338,7 @@ func (p *DefaultProvider) blockDeviceMappings(blockDeviceMappings []*v1.BlockDev } // volumeSize returns a GiB scaled value from a resource quantity or nil if the resource quantity passed in is nil -func (p *DefaultProvider) volumeSize(quantity *resource.Quantity) *int32 { +func volumeSize(quantity *resource.Quantity) *int32 { if quantity == nil { return nil } diff --git a/pkg/providers/launchtemplate/suite_test.go b/pkg/providers/launchtemplate/suite_test.go index dad250257bca..b31b02177039 100644 --- a/pkg/providers/launchtemplate/suite_test.go +++ b/pkg/providers/launchtemplate/suite_test.go @@ -273,9 +273,9 @@ var _ = Describe("LaunchTemplate Provider", func() { ExpectApplied(ctx, env.Client, nodePool, nodeClass, nodePool2, nodeClass2) ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pods...) ltConfigCount := len(awsEnv.EC2API.CreateFleetBehavior.CalledWithInput.Pop().LaunchTemplateConfigs) + len(awsEnv.EC2API.CreateFleetBehavior.CalledWithInput.Pop().LaunchTemplateConfigs) - Expect(ltConfigCount).To(BeNumerically("==", awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len())) + Expect(ltConfigCount).To(BeNumerically("==", awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len())) nodeClasses := [2]string{nodeClass.Name, nodeClass2.Name} - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { for _, value := range ltInput.LaunchTemplateData.TagSpecifications[0].Tags { if *value.Key == v1.LabelNodeClass { Expect(*value.Value).To(BeElementOf(nodeClasses)) @@ -291,9 +291,9 @@ var _ = Describe("LaunchTemplate Provider", func() { Expect(awsEnv.EC2API.CreateFleetBehavior.CalledWithInput.Len()).To(BeNumerically("==", 1)) createFleetInput := awsEnv.EC2API.CreateFleetBehavior.CalledWithInput.Pop() - Expect(len(createFleetInput.LaunchTemplateConfigs)).To(BeNumerically("==", awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len())) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 5)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(len(createFleetInput.LaunchTemplateConfigs)).To(BeNumerically("==", awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len())) + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 5)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { launchTemplate, ok := lo.Find(createFleetInput.LaunchTemplateConfigs, func(ltConfig ec2types.FleetLaunchTemplateConfigRequest) bool { return *ltConfig.LaunchTemplateSpecification.LaunchTemplateName == *ltInput.LaunchTemplateName }) @@ -318,8 +318,8 @@ var _ = Describe("LaunchTemplate Provider", func() { pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 5)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 5)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { Expect(*ltInput.LaunchTemplateData.IamInstanceProfile.Name).To(Equal("overridden-profile")) }) }) @@ -388,8 +388,8 @@ var _ = Describe("LaunchTemplate Provider", func() { pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 2)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 2)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { ltName := aws.ToString(ltInput.LaunchTemplateName) lt, ok := awsEnv.LaunchTemplateCache.Get(ltName) Expect(ok).To(Equal(true)) @@ -564,7 +564,7 @@ var _ = Describe("LaunchTemplate Provider", func() { pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(i *ec2.CreateLaunchTemplateInput) { + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(i *ec2.CreateLaunchTemplateInput) { Expect(i.LaunchTemplateData.TagSpecifications).To(HaveLen(2)) // tags should be included in instance, volume, and fleet tag specification @@ -606,8 +606,8 @@ var _ = Describe("LaunchTemplate Provider", func() { pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 5)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 5)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { Expect(len(ltInput.LaunchTemplateData.BlockDeviceMappings)).To(Equal(1)) Expect(lo.FromPtr(ltInput.LaunchTemplateData.BlockDeviceMappings[0].Ebs.VolumeSize)).To(Equal(int32(20))) Expect(ltInput.LaunchTemplateData.BlockDeviceMappings[0].Ebs.VolumeType).To(Equal(ec2types.VolumeType("gp3"))) @@ -622,8 +622,8 @@ var _ = Describe("LaunchTemplate Provider", func() { pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 5)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 5)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { Expect(len(ltInput.LaunchTemplateData.BlockDeviceMappings)).To(Equal(1)) Expect(lo.FromPtr(ltInput.LaunchTemplateData.BlockDeviceMappings[0].Ebs.VolumeSize)).To(Equal(int32(20))) Expect(ltInput.LaunchTemplateData.BlockDeviceMappings[0].Ebs.VolumeType).To(Equal(ec2types.VolumeType("gp3"))) @@ -659,8 +659,8 @@ var _ = Describe("LaunchTemplate Provider", func() { pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 5)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 5)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { Expect(ltInput.LaunchTemplateData.BlockDeviceMappings[0].Ebs).To(Equal(&ec2types.LaunchTemplateEbsBlockDeviceRequest{ VolumeSize: aws.Int32(187), VolumeType: ec2types.VolumeType("io2"), @@ -708,8 +708,8 @@ var _ = Describe("LaunchTemplate Provider", func() { pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 5)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 5)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { // Both of these values are rounded up when converting to Gibibytes Expect(lo.FromPtr(ltInput.LaunchTemplateData.BlockDeviceMappings[0].Ebs.VolumeSize)).To(BeNumerically("==", 4)) Expect(lo.FromPtr(ltInput.LaunchTemplateData.BlockDeviceMappings[1].Ebs.VolumeSize)).To(BeNumerically("==", 2)) @@ -721,8 +721,8 @@ var _ = Describe("LaunchTemplate Provider", func() { pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 5)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 5)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { Expect(len(ltInput.LaunchTemplateData.BlockDeviceMappings)).To(Equal(2)) // Bottlerocket control volume Expect(lo.FromPtr(ltInput.LaunchTemplateData.BlockDeviceMappings[0].Ebs.VolumeSize)).To(Equal(int32(4))) @@ -741,8 +741,8 @@ var _ = Describe("LaunchTemplate Provider", func() { pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 2)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 2)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { Expect(len(ltInput.LaunchTemplateData.BlockDeviceMappings)).To(Equal(0)) }) }) @@ -766,8 +766,8 @@ var _ = Describe("LaunchTemplate Provider", func() { pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 2)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 2)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { Expect(len(ltInput.LaunchTemplateData.BlockDeviceMappings)).To(Equal(1)) Expect(lo.FromPtr(ltInput.LaunchTemplateData.BlockDeviceMappings[0].Ebs.VolumeSize)).To(Equal(int32(40))) Expect(ltInput.LaunchTemplateData.BlockDeviceMappings[0].Ebs.VolumeType).To(Equal(ec2types.VolumeType("io2"))) @@ -1134,7 +1134,7 @@ var _ = Describe("LaunchTemplate Provider", func() { ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) // We expect to generate 5 launch templates for our image/max-pods combination where we were only generating 2 before - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 5)) + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 5)) }) It("should specify --system-reserved when overriding system reserved values", func() { nodeClass.Spec.Kubelet = &v1.KubeletConfiguration{ @@ -1148,8 +1148,8 @@ var _ = Describe("LaunchTemplate Provider", func() { pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 5)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 5)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { userData, err := base64.StdEncoding.DecodeString(*ltInput.LaunchTemplateData.UserData) Expect(err).To(BeNil()) @@ -1175,8 +1175,8 @@ var _ = Describe("LaunchTemplate Provider", func() { pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 5)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 5)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { userData, err := base64.StdEncoding.DecodeString(*ltInput.LaunchTemplateData.UserData) Expect(err).To(BeNil()) @@ -1202,8 +1202,8 @@ var _ = Describe("LaunchTemplate Provider", func() { pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 5)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 5)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { userData, err := base64.StdEncoding.DecodeString(*ltInput.LaunchTemplateData.UserData) Expect(err).To(BeNil()) @@ -1234,8 +1234,8 @@ var _ = Describe("LaunchTemplate Provider", func() { pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 5)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 5)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { userData, err := base64.StdEncoding.DecodeString(*ltInput.LaunchTemplateData.UserData) Expect(err).To(BeNil()) @@ -1266,8 +1266,8 @@ var _ = Describe("LaunchTemplate Provider", func() { pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 5)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 5)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { userData, err := base64.StdEncoding.DecodeString(*ltInput.LaunchTemplateData.UserData) Expect(err).To(BeNil()) @@ -1377,7 +1377,7 @@ var _ = Describe("LaunchTemplate Provider", func() { ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 5)) + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 5)) ExpectLaunchTemplatesCreatedWithUserDataContaining("--local-disks raid0") }) It("should specify RAID0 bootstrap-command when instance-store policy is set on Bottlerocket", func() { @@ -1388,7 +1388,7 @@ var _ = Describe("LaunchTemplate Provider", func() { ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 5)) + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 5)) ExpectLaunchTemplatesCreatedWithUserDataContaining(` [settings.bootstrap-commands.000-mount-instance-storage] commands = [['apiclient', 'ephemeral-storage', 'init'], ['apiclient', 'ephemeral-storage', 'bind', '--dirs', '/var/lib/containerd', '/var/lib/kubelet', '/var/log/pods']] @@ -1410,7 +1410,7 @@ essential = true ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 5)) + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 5)) ExpectLaunchTemplatesCreatedWithUserDataContaining(` [settings.bootstrap-commands] [settings.bootstrap-commands.000-mount-instance-storage] @@ -1491,8 +1491,8 @@ essential = true pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 2)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 2)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { userData, err := base64.StdEncoding.DecodeString(*ltInput.LaunchTemplateData.UserData) Expect(err).To(BeNil()) config := &bootstrap.BottlerocketConfig{} @@ -1515,8 +1515,8 @@ essential = true pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 5)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 5)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { userData, err := base64.StdEncoding.DecodeString(*ltInput.LaunchTemplateData.UserData) Expect(err).To(BeNil()) config := &bootstrap.BottlerocketConfig{} @@ -1539,8 +1539,8 @@ essential = true pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 5)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 5)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { userData, err := base64.StdEncoding.DecodeString(*ltInput.LaunchTemplateData.UserData) Expect(err).To(BeNil()) config := &bootstrap.BottlerocketConfig{} @@ -1559,8 +1559,8 @@ essential = true pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 2)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 2)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { userData, err := base64.StdEncoding.DecodeString(*ltInput.LaunchTemplateData.UserData) Expect(err).To(BeNil()) config := &bootstrap.BottlerocketConfig{} @@ -1577,8 +1577,8 @@ essential = true pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 5)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 5)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { userData, err := base64.StdEncoding.DecodeString(*ltInput.LaunchTemplateData.UserData) Expect(err).To(BeNil()) config := &bootstrap.BottlerocketConfig{} @@ -1597,8 +1597,8 @@ essential = true pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 5)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 5)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { userData, err := base64.StdEncoding.DecodeString(*ltInput.LaunchTemplateData.UserData) Expect(err).To(BeNil()) config := &bootstrap.BottlerocketConfig{} @@ -1614,8 +1614,8 @@ essential = true pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 2)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 2)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { userData, err := base64.StdEncoding.DecodeString(*ltInput.LaunchTemplateData.UserData) Expect(err).To(BeNil()) config := &bootstrap.BottlerocketConfig{} @@ -1632,8 +1632,8 @@ essential = true pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 5)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 5)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { userData, err := base64.StdEncoding.DecodeString(*ltInput.LaunchTemplateData.UserData) Expect(err).To(BeNil()) config := &bootstrap.BottlerocketConfig{} @@ -1896,7 +1896,7 @@ essential = true pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectNotScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(Equal(0)) + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(Equal(0)) }) }) Context("Custom AMI Selector", func() { @@ -1915,8 +1915,8 @@ essential = true pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 1)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 1)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { Expect("ami-123").To(Equal(*ltInput.LaunchTemplateData.ImageId)) }) }) @@ -1965,7 +1965,7 @@ essential = true ExpectScheduled(ctx, env.Client, pod) _, err := awsEnv.AMIProvider.List(ctx, nodeClass) Expect(err).To(BeNil()) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically(">=", 2)) + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically(">=", 2)) actualFilter := awsEnv.EC2API.CalledWithDescribeImagesInput.Pop().Filters expectedFilter := []ec2types.Filter{ { @@ -2000,10 +2000,10 @@ essential = true pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically(">=", 2)) + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically(">=", 2)) expectedImageIds := sets.New[string]("ami-123", "ami-456") actualImageIds := sets.New[string]() - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { actualImageIds.Insert(*ltInput.LaunchTemplateData.ImageId) }) Expect(expectedImageIds.Equal(actualImageIds)).To(BeTrue()) @@ -2036,7 +2036,7 @@ essential = true nodeClass.Spec.AMIFamily = lo.ToPtr(v1.AMIFamilyCustom) nodeClass.Spec.AMISelectorTerms = []v1.AMISelectorTerm{{Tags: map[string]string{"*": "*"}}} ExpectApplied(ctx, env.Client, nodeClass) - controller := nodeclass.NewController(env.Client, recorder, awsEnv.SubnetProvider, awsEnv.SecurityGroupProvider, awsEnv.AMIProvider, awsEnv.InstanceProfileProvider, awsEnv.LaunchTemplateProvider) + controller := nodeclass.NewController(env.Client, recorder, awsEnv.SubnetProvider, awsEnv.SecurityGroupProvider, awsEnv.AMIProvider, awsEnv.InstanceProfileProvider, awsEnv.LaunchTemplateProvider, awsEnv.EC2API) ExpectObjectReconciled(ctx, env.Client, controller, nodeClass) nodePool.Spec.Template.Spec.Requirements = []karpv1.NodeSelectorRequirementWithMinValues{ { @@ -2051,8 +2051,8 @@ essential = true pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 1)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 1)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { Expect("ami-456").To(Equal(*ltInput.LaunchTemplateData.ImageId)) }) }) @@ -2066,7 +2066,7 @@ essential = true pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectNotScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(Equal(0)) + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(Equal(0)) }) It("should fail if no instanceType matches ami requirements.", func() { awsEnv.EC2API.DescribeImagesOutput.Set(&ec2.DescribeImagesOutput{Images: []ec2types.Image{ @@ -2085,7 +2085,7 @@ essential = true pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectNotScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(Equal(0)) + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(Equal(0)) }) It("should choose amis from SSM if no selector specified in EC2NodeClass", func() { version := awsEnv.VersionProvider.Get(ctx) @@ -2105,7 +2105,7 @@ essential = true pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - input := awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Pop() + input := awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Pop() Expect(*input.LaunchTemplateData.ImageId).To(ContainSubstring("test-ami")) }) }) @@ -2123,7 +2123,7 @@ essential = true }, coretest.PodOptions{})) ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - input := awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Pop() + input := awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Pop() Expect(*input.LaunchTemplateData.NetworkInterfaces[0].AssociatePublicIpAddress).To(Equal(expectedValue)) }, Entry("AssociatePublicIPAddress is true", true, true, false), @@ -2189,8 +2189,8 @@ essential = true pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 5)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 5)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { Expect(aws.ToBool(ltInput.LaunchTemplateData.Monitoring.Enabled)).To(BeFalse()) }) }) @@ -2200,8 +2200,8 @@ essential = true pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 5)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 5)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { Expect(aws.ToBool(ltInput.LaunchTemplateData.Monitoring.Enabled)).To(BeTrue()) }) }) @@ -2212,8 +2212,8 @@ essential = true pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 5)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 5)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { Expect(ltInput.LaunchTemplateData.MetadataOptions.HttpEndpoint).To(Equal(ec2types.LaunchTemplateInstanceMetadataEndpointStateEnabled)) Expect(ltInput.LaunchTemplateData.MetadataOptions.HttpProtocolIpv6).To(Equal(ec2types.LaunchTemplateInstanceMetadataProtocolIpv6Disabled)) Expect(lo.FromPtr(ltInput.LaunchTemplateData.MetadataOptions.HttpPutResponseHopLimit)).To(BeNumerically("==", 1)) @@ -2225,8 +2225,8 @@ essential = true pod := coretest.UnschedulablePod() ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically("==", 5)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically("==", 5)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(ltInput *ec2.CreateLaunchTemplateInput) { Expect(ltInput.LaunchTemplateData.MetadataOptions.InstanceMetadataTags).To(Equal(ec2types.LaunchTemplateInstanceMetadataTagsStateDisabled)) }) }) @@ -2273,7 +2273,7 @@ essential = true }, coretest.PodOptions{})) ExpectProvisioned(ctx, env.Client, cluster, cloudProvider, prov, pod) ExpectScheduled(ctx, env.Client, pod) - input := awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Pop() + input := awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Pop() Expect(len(input.LaunchTemplateData.NetworkInterfaces)).To(BeNumerically(">=", 1)) if !isPublicAddressSet && !isEFA { @@ -2309,8 +2309,8 @@ func ExpectTags(tags []ec2types.Tag, expected map[string]string) { func ExpectLaunchTemplatesCreatedWithUserDataContaining(substrings ...string) { GinkgoHelper() - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically(">=", 1)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(input *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically(">=", 1)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(input *ec2.CreateLaunchTemplateInput) { userData, err := base64.StdEncoding.DecodeString(*input.LaunchTemplateData.UserData) ExpectWithOffset(2, err).To(BeNil()) for _, substring := range substrings { @@ -2321,8 +2321,8 @@ func ExpectLaunchTemplatesCreatedWithUserDataContaining(substrings ...string) { func ExpectLaunchTemplatesCreatedWithUserDataNotContaining(substrings ...string) { GinkgoHelper() - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically(">=", 1)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(input *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically(">=", 1)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(input *ec2.CreateLaunchTemplateInput) { userData, err := base64.StdEncoding.DecodeString(*input.LaunchTemplateData.UserData) ExpectWithOffset(2, err).To(BeNil()) for _, substring := range substrings { @@ -2333,8 +2333,8 @@ func ExpectLaunchTemplatesCreatedWithUserDataNotContaining(substrings ...string) func ExpectLaunchTemplatesCreatedWithUserData(expected string) { GinkgoHelper() - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically(">=", 1)) - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(input *ec2.CreateLaunchTemplateInput) { + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically(">=", 1)) + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(input *ec2.CreateLaunchTemplateInput) { userData, err := base64.StdEncoding.DecodeString(*input.LaunchTemplateData.UserData) ExpectWithOffset(2, err).To(BeNil()) // Newlines are always added for missing TOML fields, so strip them out before comparisons. @@ -2346,9 +2346,9 @@ func ExpectLaunchTemplatesCreatedWithUserData(expected string) { func ExpectUserDataExistsFromCreatedLaunchTemplates() []string { GinkgoHelper() - Expect(awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.Len()).To(BeNumerically(">=", 1)) + Expect(awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.Len()).To(BeNumerically(">=", 1)) userDatas := []string{} - awsEnv.EC2API.CalledWithCreateLaunchTemplateInput.ForEach(func(input *ec2.CreateLaunchTemplateInput) { + awsEnv.EC2API.CreateLaunchTemplateBehavior.CalledWithInput.ForEach(func(input *ec2.CreateLaunchTemplateInput) { userData, err := base64.StdEncoding.DecodeString(*input.LaunchTemplateData.UserData) ExpectWithOffset(2, err).To(BeNil()) userDatas = append(userDatas, string(userData)) diff --git a/pkg/utils/utils.go b/pkg/utils/utils.go index d558b0790223..7140ae9b3547 100644 --- a/pkg/utils/utils.go +++ b/pkg/utils/utils.go @@ -23,6 +23,9 @@ import ( "github.com/aws/aws-sdk-go-v2/aws" ec2types "github.com/aws/aws-sdk-go-v2/service/ec2/types" + karpv1 "sigs.k8s.io/karpenter/pkg/apis/v1" + + v1 "github.com/aws/karpenter-provider-aws/pkg/apis/v1" "github.com/samber/lo" ) @@ -83,3 +86,23 @@ func WithDefaultFloat64(key string, def float64) float64 { } return f } + +func GetTags(nodeClass *v1.EC2NodeClass, nodeClaim *karpv1.NodeClaim, clusterName string) (map[string]string, error) { + if offendingTag, found := lo.FindKeyBy(nodeClass.Spec.Tags, func(k string, v string) bool { + for _, exp := range v1.RestrictedTagPatterns { + if exp.MatchString(k) { + return true + } + } + return false + }); found { + return nil, fmt.Errorf("%q tag does not pass tag validation requirements", offendingTag) + } + staticTags := map[string]string{ + fmt.Sprintf("kubernetes.io/cluster/%s", clusterName): "owned", + karpv1.NodePoolLabelKey: nodeClaim.Labels[karpv1.NodePoolLabelKey], + v1.EKSClusterNameTagKey: clusterName, + v1.LabelNodeClass: nodeClass.Name, + } + return lo.Assign(nodeClass.Spec.Tags, staticTags), nil +} diff --git a/test/suites/ami/suite_test.go b/test/suites/ami/suite_test.go index 024a27ee88f1..8b04c2f33758 100644 --- a/test/suites/ami/suite_test.go +++ b/test/suites/ami/suite_test.go @@ -258,7 +258,8 @@ var _ = Describe("AMI", func() { nodeClass.Spec.AMISelectorTerms = []v1.AMISelectorTerm{{ID: "ami-123"}} env.ExpectCreated(nodeClass) ExpectStatusConditions(env, env.Client, 1*time.Minute, nodeClass, status.Condition{Type: v1.ConditionTypeAMIsReady, Status: metav1.ConditionFalse, Message: "AMISelector did not match any AMIs"}) - ExpectStatusConditions(env, env.Client, 1*time.Minute, nodeClass, status.Condition{Type: status.ConditionReady, Status: metav1.ConditionFalse, Message: "AMIsReady=False"}) + ExpectStatusConditions(env, env.Client, 1*time.Minute, nodeClass, status.Condition{Type: status.ConditionReady, Status: metav1.ConditionFalse, Message: "ValidationSucceeded=False, AMIsReady=False"}) + }) }) diff --git a/test/suites/integration/nodeclass_test.go b/test/suites/integration/nodeclass_test.go new file mode 100644 index 000000000000..a95ca5a9f0bd --- /dev/null +++ b/test/suites/integration/nodeclass_test.go @@ -0,0 +1,114 @@ +/* +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +*/ + +package integration_test + +import ( + "fmt" + "strings" + "time" + + "github.com/aws/aws-sdk-go-v2/aws" + "github.com/aws/aws-sdk-go-v2/service/iam" + "github.com/awslabs/operatorpkg/status" + "github.com/google/uuid" + appsv1 "k8s.io/api/apps/v1" + corev1 "k8s.io/api/core/v1" + metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" + "k8s.io/apimachinery/pkg/types" + + v1 "github.com/aws/karpenter-provider-aws/pkg/apis/v1" + + . "github.com/awslabs/operatorpkg/test/expectations" + . "github.com/onsi/ginkgo/v2" + . "github.com/onsi/gomega" +) + +var _ = Describe("NodeClass IAM Permissions", func() { + var ( + roleName string + policyName string + ) + DescribeTable("IAM Permission Failure Tests", + func(action string, expectedMessage string) { + policyDoc := fmt.Sprintf(`{ + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Deny", + "Action": ["%s"], + "Resource": "*" + } + ] + }`, action) + + deployment := &appsv1.Deployment{} + err := env.Client.Get(env.Context, types.NamespacedName{ + Namespace: "kube-system", + Name: "karpenter", + }, deployment) + Expect(err).To(BeNil()) + + sa := &corev1.ServiceAccount{} + err = env.Client.Get(env.Context, types.NamespacedName{ + Namespace: "kube-system", + Name: deployment.Spec.Template.Spec.ServiceAccountName, + }, sa) + Expect(err).To(BeNil()) + + roleName = strings.Split(sa.Annotations["eks.amazonaws.com/role-arn"], "/")[1] + policyName = fmt.Sprintf("TestPolicy-%s", uuid.New().String()) + + _, err = env.IAMAPI.PutRolePolicy(env.Context, &iam.PutRolePolicyInput{ + RoleName: aws.String(roleName), + PolicyName: aws.String(policyName), + PolicyDocument: aws.String(policyDoc), + }) + Expect(err).To(BeNil()) + + DeferCleanup(func() { + _, err := env.IAMAPI.DeleteRolePolicy(env.Context, &iam.DeleteRolePolicyInput{ + RoleName: aws.String(roleName), + PolicyName: aws.String(policyName), + }) + Expect(err).To(BeNil()) + }) + + env.ExpectCreated(nodeClass) + Eventually(func(g Gomega) { + env.ExpectUpdated(nodeClass) + g.Expect(nodeClass.StatusConditions().Get(v1.ConditionTypeValidationSucceeded).IsFalse()).To(BeTrue()) + g.Expect(nodeClass.StatusConditions().Get(v1.ConditionTypeValidationSucceeded).Reason).To(Equal(expectedMessage)) + }, "240s", "5s").Should(Succeed()) + ExpectStatusConditions(env, env.Client, 1*time.Minute, nodeClass, status.Condition{Type: status.ConditionReady, Status: metav1.ConditionFalse, Message: "ValidationSucceeded=False"}) + }, + Entry("should fail when CreateFleet is denied", + "ec2:CreateFleet", + "CreateFleetAuthCheckFailed"), + Entry("should fail when CreateLaunchTemplate is denied", + "ec2:CreateLaunchTemplate", + "CreateLaunchTemplateAuthCheckFailed"), + Entry("should fail when RunInstances is denied", + "ec2:RunInstances", + "RunInstancesAuthCheckFailed"), + ) + + It("should succeed with all required permissions", func() { + env.ExpectCreated(nodeClass) + Eventually(func(g Gomega) { + env.ExpectUpdated(nodeClass) + g.Expect(nodeClass.StatusConditions().Get(v1.ConditionTypeValidationSucceeded).IsTrue()).To(BeTrue()) + }, "60s", "5s").Should(Succeed()) + }) +}) diff --git a/test/suites/integration/security_group_test.go b/test/suites/integration/security_group_test.go index ed4015df1daa..daec68f8734d 100644 --- a/test/suites/integration/security_group_test.go +++ b/test/suites/integration/security_group_test.go @@ -90,7 +90,7 @@ var _ = Describe("SecurityGroups", func() { } env.ExpectCreated(nodeClass) ExpectStatusConditions(env, env.Client, 1*time.Minute, nodeClass, status.Condition{Type: v1.ConditionTypeSecurityGroupsReady, Status: metav1.ConditionFalse, Message: "SecurityGroupSelector did not match any SecurityGroups"}) - ExpectStatusConditions(env, env.Client, 1*time.Minute, nodeClass, status.Condition{Type: status.ConditionReady, Status: metav1.ConditionFalse, Message: "SecurityGroupsReady=False"}) + ExpectStatusConditions(env, env.Client, 1*time.Minute, nodeClass, status.Condition{Type: status.ConditionReady, Status: metav1.ConditionFalse, Message: "ValidationSucceeded=False, SecurityGroupsReady=False"}) }) }) diff --git a/test/suites/integration/subnet_test.go b/test/suites/integration/subnet_test.go index b0fa5513e60d..717dbd5a6427 100644 --- a/test/suites/integration/subnet_test.go +++ b/test/suites/integration/subnet_test.go @@ -140,7 +140,7 @@ var _ = Describe("Subnets", func() { } env.ExpectCreated(nodeClass) ExpectStatusConditions(env, env.Client, 1*time.Minute, nodeClass, status.Condition{Type: v1.ConditionTypeSubnetsReady, Status: metav1.ConditionFalse, Message: "SubnetSelector did not match any Subnets"}) - ExpectStatusConditions(env, env.Client, 1*time.Minute, nodeClass, status.Condition{Type: status.ConditionReady, Status: metav1.ConditionFalse, Message: "SubnetsReady=False"}) + ExpectStatusConditions(env, env.Client, 1*time.Minute, nodeClass, status.Condition{Type: status.ConditionReady, Status: metav1.ConditionFalse, Message: "ValidationSucceeded=False, SubnetsReady=False"}) }) }) From 934a78bb096f1f6ba73e62ae5a2a429c0944c0a4 Mon Sep 17 00:00:00 2001 From: Jason Deal Date: Wed, 12 Feb 2025 14:43:10 -0800 Subject: [PATCH 32/34] deps: bump sigs.k8s.io/karpenter (#7735) --- .../templates/karpenter.k8s.aws_ec2nodeclasses.yaml | 2 +- go.mod | 2 +- go.sum | 4 ++-- pkg/apis/crds/karpenter.k8s.aws_ec2nodeclasses.yaml | 2 +- test/pkg/environment/common/expectations.go | 4 ++-- 5 files changed, 7 insertions(+), 7 deletions(-) diff --git a/charts/karpenter-crd/templates/karpenter.k8s.aws_ec2nodeclasses.yaml b/charts/karpenter-crd/templates/karpenter.k8s.aws_ec2nodeclasses.yaml index a8494041ff15..75021e7afb46 100644 --- a/charts/karpenter-crd/templates/karpenter.k8s.aws_ec2nodeclasses.yaml +++ b/charts/karpenter-crd/templates/karpenter.k8s.aws_ec2nodeclasses.yaml @@ -6,7 +6,7 @@ metadata: {{- with .Values.additionalAnnotations }} {{- toYaml . | nindent 4 }} {{- end }} - controller-gen.kubebuilder.io/version: v0.17.1 + controller-gen.kubebuilder.io/version: v0.17.2 name: ec2nodeclasses.karpenter.k8s.aws spec: group: karpenter.k8s.aws diff --git a/go.mod b/go.mod index 3937b354dadb..86016005da71 100644 --- a/go.mod +++ b/go.mod @@ -44,7 +44,7 @@ require ( k8s.io/klog/v2 v2.130.1 k8s.io/utils v0.0.0-20241104100929-3ea5e8cea738 sigs.k8s.io/controller-runtime v0.20.1 - sigs.k8s.io/karpenter v1.2.1-0.20250211002957-aa118786c83c + sigs.k8s.io/karpenter v1.2.1-0.20250212185021-45f73ec7a790 sigs.k8s.io/yaml v1.4.0 ) diff --git a/go.sum b/go.sum index e855f5e50fcd..d9370d0c0a1a 100644 --- a/go.sum +++ b/go.sum @@ -338,8 +338,8 @@ sigs.k8s.io/controller-runtime v0.20.1 h1:JbGMAG/X94NeM3xvjenVUaBjy6Ui4Ogd/J5Ztj sigs.k8s.io/controller-runtime v0.20.1/go.mod h1:BrP3w158MwvB3ZbNpaAcIKkHQ7YGpYnzpoSTZ8E14WU= sigs.k8s.io/json v0.0.0-20241010143419-9aa6b5e7a4b3 h1:/Rv+M11QRah1itp8VhT6HoVx1Ray9eB4DBr+K+/sCJ8= sigs.k8s.io/json v0.0.0-20241010143419-9aa6b5e7a4b3/go.mod h1:18nIHnGi6636UCz6m8i4DhaJ65T6EruyzmoQqI2BVDo= -sigs.k8s.io/karpenter v1.2.1-0.20250211002957-aa118786c83c h1:1FsZR40Lx9lTINMRCmgi+BdnHVWWhmfwxFq1RfcCArY= -sigs.k8s.io/karpenter v1.2.1-0.20250211002957-aa118786c83c/go.mod h1:R6cr2+SbbgXtKtiuyRFdZCbqWN2kNTduqshnQRoyOr8= +sigs.k8s.io/karpenter v1.2.1-0.20250212185021-45f73ec7a790 h1:FXm0rL9jchktDDEqJ9bGhNkpGzauYhXxroMzzvohAO8= +sigs.k8s.io/karpenter v1.2.1-0.20250212185021-45f73ec7a790/go.mod h1:R6cr2+SbbgXtKtiuyRFdZCbqWN2kNTduqshnQRoyOr8= sigs.k8s.io/structured-merge-diff/v4 v4.4.2 h1:MdmvkGuXi/8io6ixD5wud3vOLwc1rj0aNqRlpuvjmwA= sigs.k8s.io/structured-merge-diff/v4 v4.4.2/go.mod h1:N8f93tFZh9U6vpxwRArLiikrE5/2tiu1w1AGfACIGE4= sigs.k8s.io/yaml v1.4.0 h1:Mk1wCc2gy/F0THH0TAp1QYyJNzRm2KCLy3o5ASXVI5E= diff --git a/pkg/apis/crds/karpenter.k8s.aws_ec2nodeclasses.yaml b/pkg/apis/crds/karpenter.k8s.aws_ec2nodeclasses.yaml index a2444bfcd03c..3b915b075962 100644 --- a/pkg/apis/crds/karpenter.k8s.aws_ec2nodeclasses.yaml +++ b/pkg/apis/crds/karpenter.k8s.aws_ec2nodeclasses.yaml @@ -3,7 +3,7 @@ apiVersion: apiextensions.k8s.io/v1 kind: CustomResourceDefinition metadata: annotations: - controller-gen.kubebuilder.io/version: v0.17.1 + controller-gen.kubebuilder.io/version: v0.17.2 name: ec2nodeclasses.karpenter.k8s.aws spec: group: karpenter.k8s.aws diff --git a/test/pkg/environment/common/expectations.go b/test/pkg/environment/common/expectations.go index d3d8acbc2b4a..52aa5d4016b0 100644 --- a/test/pkg/environment/common/expectations.go +++ b/test/pkg/environment/common/expectations.go @@ -1011,7 +1011,7 @@ func (env *Environment) GetDaemonSetCount(np *karpv1.NodePool) int { return lo.CountBy(daemonSetList.Items, func(d appsv1.DaemonSet) bool { p := &corev1.Pod{Spec: d.Spec.Template.Spec} nodeClaimTemplate := pscheduling.NewNodeClaimTemplate(np) - if err := scheduling.Taints(nodeClaimTemplate.Spec.Taints).Tolerates(p); err != nil { + if err := scheduling.Taints(nodeClaimTemplate.Spec.Taints).ToleratesPod(p); err != nil { return false } if err := nodeClaimTemplate.Requirements.Compatible(scheduling.NewPodRequirements(p), scheduling.AllowUndefinedWellKnownLabels); err != nil { @@ -1032,7 +1032,7 @@ func (env *Environment) GetDaemonSetOverhead(np *karpv1.NodePool) corev1.Resourc return coreresources.RequestsForPods(lo.FilterMap(daemonSetList.Items, func(ds appsv1.DaemonSet, _ int) (*corev1.Pod, bool) { p := &corev1.Pod{Spec: ds.Spec.Template.Spec} nodeClaimTemplate := pscheduling.NewNodeClaimTemplate(np) - if err := scheduling.Taints(nodeClaimTemplate.Spec.Taints).Tolerates(p); err != nil { + if err := scheduling.Taints(nodeClaimTemplate.Spec.Taints).ToleratesPod(p); err != nil { return nil, false } if err := nodeClaimTemplate.Requirements.Compatible(scheduling.NewPodRequirements(p), scheduling.AllowUndefinedWellKnownLabels); err != nil { From 3da112d438d0b87adaa844771343032c9bfe5250 Mon Sep 17 00:00:00 2001 From: Adrian Mester <33199+adrianmester@users.noreply.github.com> Date: Thu, 13 Feb 2025 21:55:32 +0200 Subject: [PATCH 33/34] docs: fix typo in concepts/disruption (#7737) --- website/content/en/docs/concepts/disruption.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/content/en/docs/concepts/disruption.md b/website/content/en/docs/concepts/disruption.md index 85e173bec923..034282c9a437 100644 --- a/website/content/en/docs/concepts/disruption.md +++ b/website/content/en/docs/concepts/disruption.md @@ -247,7 +247,7 @@ To configure a maximum termination duration, `terminationGracePeriod` should be It is configured through a NodePool's [`spec.template.spec.terminationGracePeriod`]({{}}) field, and is persisted to created NodeClaims (`spec.terminationGracePeriod`). Changes to the [`spec.template.spec.terminationGracePeriod`]({{}}) field on the NodePool will not result in a change for existing NodeClaims - it will induce NodeClaim drift and the replacements will have the updated `terminationGracePeriod`. -Once a node is disrupted, via either a [graceful](#automated-graceful-methods) or [forceful](#automated-forceful-methods) disruption method, Karpenter will being draining the node. +Once a node is disrupted, via either a [graceful](#automated-graceful-methods) or [forceful](#automated-forceful-methods) disruption method, Karpenter will begin draining the node. At this point, the countdown for `terminationGracePeriod` begins. Once the `terminationGracePeriod` elapses, remaining pods will be forcibly deleted and the unerlying instance will be terminated. A node may be terminated before the `terminationGracePeriod` has elapsed if all disruptable pods have been drained. From 75f432284841d7e3598bc8f20a78e2f8bd6e722e Mon Sep 17 00:00:00 2001 From: Jason Deal Date: Thu, 13 Feb 2025 14:11:29 -0800 Subject: [PATCH 34/34] deps: bump go to 1.24.0 (#7733) --- go.mod | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/go.mod b/go.mod index 86016005da71..ab9a41dd4d9b 100644 --- a/go.mod +++ b/go.mod @@ -1,6 +1,6 @@ module github.com/aws/karpenter-provider-aws -go 1.23.6 +go 1.24.0 require ( github.com/Pallinder/go-randomdata v1.2.0