From a7882d9688278cf3269d1dc3fd714550a742f8de Mon Sep 17 00:00:00 2001 From: Nuru Date: Sun, 9 Jun 2024 19:38:48 -0700 Subject: [PATCH 1/5] Add support for AL2023 --- README.md | 27 +-- ami.tf | 127 ++++++------ docs/terraform.md | 27 +-- examples/complete/fixtures.us-east-2.tfvars | 14 +- examples/complete/main.tf | 15 +- examples/complete/outputs.tf | 5 + examples/complete/variables.tf | 44 +++-- examples/complete/versions.tf | 18 +- launch-template.tf | 12 +- main.tf | 28 +-- outputs.tf | 11 +- userdata.tf | 48 ++++- userdata.tpl | 1 + userdata_al2023.tpl | 31 +++ variables-deprecated.tf | 26 +++ variables.tf | 205 ++++++++++---------- versions.tf | 3 +- 17 files changed, 371 insertions(+), 271 deletions(-) create mode 100644 userdata_al2023.tpl diff --git a/README.md b/README.md index f1ded57..34a4cca 100644 --- a/README.md +++ b/README.md @@ -247,14 +247,14 @@ https://docs.aws.amazon.com/eks/latest/userguide/windows-support.html | Name | Version | |------|---------| | [terraform](#requirement\_terraform) | >= 1.3.0 | -| [aws](#requirement\_aws) | >= 4.48 | +| [aws](#requirement\_aws) | >= 5.8 | | [random](#requirement\_random) | >= 2.0 | ## Providers | Name | Version | |------|---------| -| [aws](#provider\_aws) | >= 4.48 | +| [aws](#provider\_aws) | >= 5.8 | | [random](#provider\_random) | >= 2.0 | ## Modules @@ -279,12 +279,12 @@ https://docs.aws.amazon.com/eks/latest/userguide/windows-support.html | [aws_iam_role_policy_attachment.ipv6_eks_cni_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy_attachment) | resource | | [aws_launch_template.default](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/launch_template) | resource | | [random_pet.cbd](https://registry.terraform.io/providers/hashicorp/random/latest/docs/resources/pet) | resource | -| [aws_ami.selected](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/ami) | data source | | [aws_eks_cluster.this](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/eks_cluster) | data source | | [aws_iam_policy_document.assume_role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source | | [aws_iam_policy_document.ipv6_eks_cni_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source | | [aws_launch_template.this](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/launch_template) | data source | | [aws_partition.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/partition) | data source | +| [aws_ssm_parameter.ami_id](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/ssm_parameter) | data source | ## Inputs @@ -292,18 +292,19 @@ https://docs.aws.amazon.com/eks/latest/userguide/windows-support.html |------|-------------|------|---------|:--------:| | [additional\_tag\_map](#input\_additional\_tag\_map) | Additional key-value pairs to add to each map in `tags_as_list_of_maps`. Not added to `tags` or `id`.
This is for some rare cases where resources want additional configuration of tags
and therefore take a list of maps with tag key, value, and additional configuration. | `map(string)` | `{}` | no | | [after\_cluster\_joining\_userdata](#input\_after\_cluster\_joining\_userdata) | Additional `bash` commands to execute on each worker node after joining the EKS cluster (after executing the `bootstrap.sh` script). For more info, see https://kubedex.com/90-days-of-aws-eks-in-production | `list(string)` | `[]` | no | -| [ami\_image\_id](#input\_ami\_image\_id) | AMI to use. Ignored if `launch_template_id` is supplied. | `list(string)` | `[]` | no | -| [ami\_release\_version](#input\_ami\_release\_version) | EKS AMI version to use, e.g. For AL2 "1.16.13-20200821" or for bottlerocket "1.2.0-ccf1b754" (no "v") or for Windows "2023.02.14". For AL2, bottlerocket and Windows, it defaults to latest version for Kubernetes version. | `list(string)` | `[]` | no | -| [ami\_type](#input\_ami\_type) | Type of Amazon Machine Image (AMI) associated with the EKS Node Group.
Defaults to `AL2_x86_64`. Valid values: `AL2_x86_64, AL2_x86_64_GPU, AL2_ARM_64, CUSTOM, BOTTLEROCKET_ARM_64, BOTTLEROCKET_x86_64, BOTTLEROCKET_ARM_64_NVIDIA, BOTTLEROCKET_x86_64_NVIDIA, WINDOWS_CORE_2019_x86_64, WINDOWS_FULL_2019_x86_64, WINDOWS_CORE_2022_x86_64, WINDOWS_FULL_2022_x86_64`. | `string` | `"AL2_x86_64"` | no | +| [ami\_image\_id](#input\_ami\_image\_id) | AMI to use, overriding other AMI specifications, but must match `ami_type`. Ignored if `launch_template_id` is supplied. | `list(string)` | `[]` | no | +| [ami\_release\_version](#input\_ami\_release\_version) | OBSOLETE: Use `ami_specifier` instead. Note that it has a different format.
Historical description: EKS AMI version to use, e.g. For AL2 \"1.16.13-20200821\" or for bottlerocket \"1.2.0-ccf1b754\" (no \"v\") or for Windows \"2023.02.14\". For AL2, bottlerocket and Windows, it defaults to latest version for Kubernetes version." | `list(string)` | `[]` | no | +| [ami\_specifier](#input\_ami\_specifier) | OS-dependent specifier for one of the several AMIs that match OS, architecture, and Kubernetes 1.xx version.
If not specified the recommended/latest AMI for the given Kubernetes version will be used.
Examples:
AL2: amazon-eks-node-1.29-v20240117
AL2023: amazon-eks-node-al2023-x86\_64-standard-1.29-v20240605
Bottlerocket: 1.20.1-7c3e9198
Windows: | `string` | `"recommended"` | no | +| [ami\_type](#input\_ami\_type) | Type of Amazon Machine Image (AMI) associated with the EKS Node Group.
Defaults to `AL2_x86_64`. Valid values: `AL2_x86_64, AL2_x86_64_GPU, AL2_ARM_64, CUSTOM, BOTTLEROCKET_ARM_64, BOTTLEROCKET_x86_64, BOTTLEROCKET_ARM_64_NVIDIA, BOTTLEROCKET_x86_64_NVIDIA, WINDOWS_CORE_2019_x86_64, WINDOWS_FULL_2019_x86_64, WINDOWS_CORE_2022_x86_64, WINDOWS_FULL_2022_x86_64, AL2023_x86_64_STANDARD, AL2023_ARM_64_STANDARD`. | `string` | `"AL2_x86_64"` | no | | [associate\_cluster\_security\_group](#input\_associate\_cluster\_security\_group) | When true, associate the default cluster security group to the nodes. If disabled the EKS managed security group will not
be associated to the nodes and you will need to provide another security group that allows the nodes to communicate with
the EKS control plane. Be aware that if no `associated_security_group_ids` or `ssh_access_security_group_ids` are provided,
then the nodes will have no inbound or outbound rules. | `bool` | `true` | no | | [associated\_security\_group\_ids](#input\_associated\_security\_group\_ids) | A list of IDs of Security Groups to associate the node group with, in addition to the EKS' created security group.
These security groups will not be modified. | `list(string)` | `[]` | no | | [attributes](#input\_attributes) | ID element. Additional attributes (e.g. `workers` or `cluster`) to add to `id`,
in the order they appear in the list. New attributes are appended to the
end of the list. The elements of the list are joined by the `delimiter`
and treated as a single ID element. | `list(string)` | `[]` | no | | [before\_cluster\_joining\_userdata](#input\_before\_cluster\_joining\_userdata) | Additional `bash` commands to execute on each worker node before joining the EKS cluster (before executing the `bootstrap.sh` script). For more info, see https://kubedex.com/90-days-of-aws-eks-in-production | `list(string)` | `[]` | no | | [block\_device\_map](#input\_block\_device\_map) | Map of block device name specification, see [launch\_template.block-devices](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/launch_template#block-devices). |
map(object({
no_device = optional(bool, null)
virtual_name = optional(string, null)
ebs = optional(object({
delete_on_termination = optional(bool, true)
encrypted = optional(bool, true)
iops = optional(number, null)
kms_key_id = optional(string, null)
snapshot_id = optional(string, null)
throughput = optional(number, null)
volume_size = optional(number, 20)
volume_type = optional(string, "gp3")
}))
}))
|
{
"/dev/xvda": {
"ebs": {}
}
}
| no | | [block\_device\_mappings](#input\_block\_device\_mappings) | DEPRECATED: Use `block_device_map` instead.
List of block device mappings for the launch template.
Each list element is an object with a `device_name` key and
any keys supported by the `ebs` block of `launch_template`. | `list(any)` | `null` | no | -| [bootstrap\_additional\_options](#input\_bootstrap\_additional\_options) | Additional options to bootstrap.sh. DO NOT include `--kubelet-additional-args`, use `kubelet_additional_options` var instead. | `list(string)` | `[]` | no | +| [bootstrap\_additional\_options](#input\_bootstrap\_additional\_options) | Additional options to bootstrap.sh. DO NOT include `--kubelet-additional-args`, use `kubelet_additional_options` var instead. Not used with AL2023 AMI types. | `list(string)` | `[]` | no | | [capacity\_type](#input\_capacity\_type) | Type of capacity associated with the EKS Node Group. Valid values: "ON\_DEMAND", "SPOT", or `null`.
Terraform will only perform drift detection if a configuration value is provided. | `string` | `null` | no | -| [cluster\_autoscaler\_enabled](#input\_cluster\_autoscaler\_enabled) | Set `true` to label the node group so that the [Kubernetes Cluster Autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md#auto-discovery-setup) will discover and autoscale it.
Note that even when `false`, EKS will set the `k8s.io/cluster-autoscaler/enabled` label to `true` on the node group. | `bool` | `false` | no | +| [cluster\_autoscaler\_enabled](#input\_cluster\_autoscaler\_enabled) | OBSOLETE. Used to add support for the Kubernetes Cluster Autoscaler, but additional support is no longer needed. | `bool` | `null` | no | | [cluster\_name](#input\_cluster\_name) | The name of the EKS cluster | `string` | n/a | yes | | [context](#input\_context) | Single object for setting entire context at once.
See description of individual variables for details.
Leave string and numeric variables as `null` to use default value.
Individual variable settings (non-null) override settings in context object,
except for attributes, tags, and additional\_tag\_map, which are merged. | `any` |
{
"additional_tag_map": {},
"attributes": [],
"delimiter": null,
"descriptor_formats": {},
"enabled": true,
"environment": null,
"id_length_limit": null,
"label_key_case": null,
"label_order": [],
"label_value_case": null,
"labels_as_tags": [
"unset"
],
"name": null,
"namespace": null,
"regex_replace_chars": null,
"stage": null,
"tags": {},
"tenant": null
}
| no | | [cpu\_options](#input\_cpu\_options) | Configuration for the [`cpu_options` Configuration Block](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/launch_template#cpu_options) of the launch template.
Leave list empty for defaults. Pass list with single object with attributes matching the `cpu_options` block to configure it.
Note that this configures the launch template only. Some elements will be ignored by the Auto Scaling Group
that actually launches instances. Consult AWS documentation for details. | `list(any)` | `[]` | no | @@ -322,7 +323,7 @@ https://docs.aws.amazon.com/eks/latest/userguide/windows-support.html | [instance\_types](#input\_instance\_types) | Instance types to use for this node group (up to 20). Defaults to ["t3.medium"].
Must be empty if the launch template configured by `launch_template_id` specifies an instance type. | `list(string)` |
[
"t3.medium"
]
| no | | [kubelet\_additional\_options](#input\_kubelet\_additional\_options) | Additional flags to pass to kubelet.
DO NOT include `--node-labels` or `--node-taints`,
use `kubernetes_labels` and `kubernetes_taints` to specify those." | `list(string)` | `[]` | no | | [kubernetes\_labels](#input\_kubernetes\_labels) | Key-value mapping of Kubernetes labels. Only labels that are applied with the EKS API are managed by this argument.
Other Kubernetes labels applied to the EKS Node Group will not be managed. | `map(string)` | `{}` | no | -| [kubernetes\_taints](#input\_kubernetes\_taints) | List of `key`, `value`, `effect` objects representing Kubernetes taints.
`effect` must be one of `NO_SCHEDULE`, `NO_EXECUTE`, or `PREFER_NO_SCHEDULE`.
`key` and `effect` are required, `value` may be null. |
list(object({
key = string
value = string
effect = string
}))
| `[]` | no | +| [kubernetes\_taints](#input\_kubernetes\_taints) | List of `key`, `value`, `effect` objects representing Kubernetes taints.
`effect` must be one of `NO_SCHEDULE`, `NO_EXECUTE`, or `PREFER_NO_SCHEDULE`.
`key` and `effect` are required, `value` may be null. |
list(object({
key = string
value = optional(string)
effect = string
}))
| `[]` | no | | [kubernetes\_version](#input\_kubernetes\_version) | Kubernetes version. Defaults to EKS Cluster Kubernetes version. Terraform will only perform drift detection if a configuration value is provided | `list(string)` | `[]` | no | | [label\_key\_case](#input\_label\_key\_case) | Controls the letter case of the `tags` keys (label names) for tags generated by this module.
Does not affect keys of tags passed in via the `tags` input.
Possible values: `lower`, `title`, `upper`.
Default value: `title`. | `string` | `null` | no | | [label\_order](#input\_label\_order) | The order in which the labels (ID elements) appear in the `id`.
Defaults to ["namespace", "environment", "stage", "name", "attributes"].
You can omit any of the 6 labels ("tenant" is the 6th), but at least one must be present. | `list(string)` | `null` | no | @@ -338,7 +339,7 @@ https://docs.aws.amazon.com/eks/latest/userguide/windows-support.html | [module\_depends\_on](#input\_module\_depends\_on) | Can be any value desired. Module will wait for this value to be computed before creating node group. | `any` | `null` | no | | [name](#input\_name) | ID element. Usually the component or solution name, e.g. 'app' or 'jenkins'.
This is the only ID element not also included as a `tag`.
The "name" tag is set to the full `id` string. There is no tag with the value of the `name` input. | `string` | `null` | no | | [namespace](#input\_namespace) | ID element. Usually an abbreviation of your organization name, e.g. 'eg' or 'cp', to help ensure generated IDs are globally unique | `string` | `null` | no | -| [node\_group\_terraform\_timeouts](#input\_node\_group\_terraform\_timeouts) | Configuration for the Terraform [`timeouts` Configuration Block](https://www.terraform.io/docs/language/resources/syntax.html#operation-timeouts) of the node group resource.
Leave list empty for defaults. Pass list with single object with attributes matching the `timeouts` block to configure it.
Leave attribute values `null` to preserve individual defaults while setting others. |
list(object({
create = string
update = string
delete = string
}))
| `[]` | no | +| [node\_group\_terraform\_timeouts](#input\_node\_group\_terraform\_timeouts) | Configuration for the Terraform [`timeouts` Configuration Block](https://www.terraform.io/docs/language/resources/syntax.html#operation-timeouts) of the node group resource.
Leave list empty for defaults. Pass list with single object with attributes matching the `timeouts` block to configure it.
Leave attribute values `null` to preserve individual defaults while setting others. |
list(object({
create = optional(string)
update = optional(string)
delete = optional(string)
}))
| `[]` | no | | [node\_role\_arn](#input\_node\_role\_arn) | If provided, assign workers the given role, which this module will not modify | `list(string)` | `[]` | no | | [node\_role\_cni\_policy\_enabled](#input\_node\_role\_cni\_policy\_enabled) | When true, the `AmazonEKS_CNI_Policy` will be attached to the node IAM role.
This used to be required, but it is [now recommended](https://docs.aws.amazon.com/eks/latest/userguide/create-node-role.html) that this policy be
attached only to the `aws-node` Kubernetes service account. However, that
is difficult to do with Terraform, so this module defaults to the old pattern. | `bool` | `true` | no | | [node\_role\_permissions\_boundary](#input\_node\_role\_permissions\_boundary) | If provided, all IAM roles will be created with this permissions boundary attached. | `string` | `null` | no | @@ -353,12 +354,15 @@ https://docs.aws.amazon.com/eks/latest/userguide/windows-support.html | [tags](#input\_tags) | Additional tags (e.g. `{'BusinessUnit': 'XYZ'}`).
Neither the tag keys nor the tag values will be modified by this module. | `map(string)` | `{}` | no | | [tenant](#input\_tenant) | ID element \_(Rarely used, not included by default)\_. A customer identifier, indicating who this instance of a resource is for | `string` | `null` | no | | [update\_config](#input\_update\_config) | Configuration for the `eks_node_group` [`update_config` Configuration Block](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_node_group#update_config-configuration-block).
Specify exactly one of `max_unavailable` (node count) or `max_unavailable_percentage` (percentage of nodes). | `list(map(number))` | `[]` | no | -| [userdata\_override\_base64](#input\_userdata\_override\_base64) | Many features of this module rely on the `bootstrap.sh` provided with Amazon Linux, and this module
may generate "user data" that expects to find that script. If you want to use an AMI that is not
compatible with the Amazon Linux `bootstrap.sh` initialization, then use `userdata_override_base64` to provide
your own (Base64 encoded) user data. Use "" to prevent any user data from being set.

Setting `userdata_override_base64` disables `kubernetes_taints`, `kubelet_additional_options`,
`before_cluster_joining_userdata`, `after_cluster_joining_userdata`, and `bootstrap_additional_options`. | `list(string)` | `[]` | no | +| [userdata\_override\_base64](#input\_userdata\_override\_base64) | Many features of this module rely on the `bootstrap.sh` provided with Amazon Linux, and this module
may generate "user data" that expects to find that script. If you want to use an AMI that is not
compatible with the userdata generated by this module, then use `userdata_override_base64` to provide
your own (Base64 encoded) user data. Use "" to prevent any user data from being set.

Setting `userdata_override_base64` disables `kubernetes_taints`, `kubelet_additional_options`,
`before_cluster_joining_userdata`, `after_cluster_joining_userdata`, and `bootstrap_additional_options`. | `list(string)` | `[]` | no | ## Outputs | Name | Description | |------|-------------| +| [WARNING\_cluster\_autoscaler\_enabled](#output\_WARNING\_cluster\_autoscaler\_enabled) | n/a | +| [ami\_ids](#output\_ami\_ids) | n/a | +| [eks\_node\_group\_ami\_id](#output\_eks\_node\_group\_ami\_id) | The ID of the AMI used for the worker nodes, if specified | | [eks\_node\_group\_arn](#output\_eks\_node\_group\_arn) | Amazon Resource Name (ARN) of the EKS Node Group | | [eks\_node\_group\_cbd\_pet\_name](#output\_eks\_node\_group\_cbd\_pet\_name) | The pet name of this node group, if this module generated one | | [eks\_node\_group\_id](#output\_eks\_node\_group\_id) | EKS Cluster name and EKS Node Group name separated by a colon | @@ -370,7 +374,6 @@ https://docs.aws.amazon.com/eks/latest/userguide/windows-support.html | [eks\_node\_group\_role\_name](#output\_eks\_node\_group\_role\_name) | Name of the worker nodes IAM role | | [eks\_node\_group\_status](#output\_eks\_node\_group\_status) | Status of the EKS Node Group | | [eks\_node\_group\_tags\_all](#output\_eks\_node\_group\_tags\_all) | A map of tags assigned to the resource, including those inherited from the provider default\_tags configuration block. | -| [eks\_node\_group\_windows\_note](#output\_eks\_node\_group\_windows\_note) | Instructions on changes a user needs to follow or script for a windows node group in the event of a custom ami | diff --git a/ami.tf b/ami.tf index a20a027..b350944 100644 --- a/ami.tf +++ b/ami.tf @@ -1,80 +1,73 @@ -locals { - # "amazon-eks-gpu-node-", - arch_label_map = { - AL2_x86_64 = "", - AL2_x86_64_GPU = "-gpu", - AL2_ARM_64 = "-arm64", - BOTTLEROCKET_x86_64 = "x86_64", - BOTTLEROCKET_ARM_64 = "aarch64" - BOTTLEROCKET_ARM_64_NVIDIA = "-gpu" - BOTTLEROCKET_x86_64_NVIDIA = "-gpu" - WINDOWS_CORE_2019_x86_64 = "" - WINDOWS_FULL_2019_x86_64 = "" - WINDOWS_CORE_2022_x86_64 = "" - WINDOWS_FULL_2022_x86_64 = "" - } +# Previously, we found AMIs by using the aws_ami data source with a name_regex filter +# and `most_recent = true`. Unfortunately, `most_recent` means most recently created, +# and may not be the most recent Kubernetes version if, for example, a previous version +# had a new `eksbuild`. So instead, we now use the AMI IDs published in SSM. +# See https://docs.aws.amazon.com/eks/latest/userguide/retrieve-ami-id.html +# https://docs.aws.amazon.com/eks/latest/userguide/retrieve-ami-id-bottlerocket.html - ami_kind = split("_", var.ami_type)[0] != "WINDOWS" ? split("_", var.ami_type)[0] : format("WINDOWS_%s_%s", split("_", var.ami_type)[1], split("_", var.ami_type)[2]) +# Amazon Linux: https://docs.aws.amazon.com/eks/latest/userguide/retrieve-ami-id.html +# aws ssm get-parameter --name /aws/service/eks/optimized-ami/1.30/amazon-linux-2/recommended/image_id \ +# --query "Parameter.Value" --output text +# Bottlerocket https://github.com/bottlerocket-os/bottlerocket/blob/develop/QUICKSTART-EKS.md#finding-an-ami +# aws ssm get-parameter --name /aws/service/bottlerocket/aws-k8s-1.30/x86_64/latest/image_id \ +# --query "Parameter.Value" --output text +# Windows: https://docs.aws.amazon.com/eks/latest/userguide/retrieve-windows-ami-id.html +# aws ssm get-parameter --name /aws/service/ami-windows-latest/Windows_Server-2019-English-Core-EKS_Optimized-1.30/image_id \ +# --region region-code --query "Parameter.Value" --output text - ami_format = { - # amazon-eks{arch_label}-node-{ami_kubernetes_version}-v{ami_version} - # e.g. amazon-eks-arm64-node-1.21-v20211013 - AL2 = "amazon-eks%s-node-%s" - # bottlerocket-aws-k8s-{ami_kubernetes_version}-{arch_label}-v{ami_version} - # e.g. bottlerocket-aws-k8s-1.21-x86_64-v1[2].0-ccf1b754 - BOTTLEROCKET = "bottlerocket-aws-k8s-%s-%s-%s" - # Windows_Server-2019-English-Core-EKS_Optimized-{ami_kubernetes_version}-{ami_version} - # e.g. Windows_Server-2019-English-Core-EKS_Optimized-1.23-2022.11.08 - WINDOWS_CORE_2019 = "Windows_Server-2019-English-Core-EKS_Optimized-%s-%s" - WINDOWS_FULL_2019 = "Windows_Server-2019-English-Full-EKS_Optimized-%s-%s" - WINDOWS_CORE_2022 = "Windows_Server-2022-English-Core-EKS_Optimized-%s-%s" - WINDOWS_FULL_2022 = "Windows_Server-2022-English-Full-EKS_Optimized-%s-%s" - } - # Kubernetes version priority (first one to be set wins) - # 1. prefix of var.ami_release_version - # 2. var.kubernetes_version - # 3. data.eks_cluster.this.kubernetes_version - need_cluster_kubernetes_version = local.enabled ? local.need_ami_id && length(var.kubernetes_version) == 0 : false +locals { + # Public SSM parameters all start with /aws/service/ - use_cluster_kubernetes_version = local.need_cluster_kubernetes_version && (local.ami_kind == "BOTTLEROCKET" || length(var.ami_release_version) == 0) + # format string that makes + # format(fmt, specifier, k8s_version) the SSM parameter name to retrieve - ami_kubernetes_version = local.need_ami_id ? (local.use_cluster_kubernetes_version ? data.aws_eks_cluster.this[0].version : - regex("^(\\d+\\.\\d+)", coalesce(local.ami_kind == "AL2" ? try(var.ami_release_version[0], null) : null, try(var.kubernetes_version[0], null)))[0] - ) : "" + ami_ssm_format = { + AL2_x86_64 = "/aws/service/eks/optimized-ami/%[2]v/amazon-linux-2/%[1]v/image_id" + AL2_x86_64_GPU = "/aws/service/eks/optimized-ami/%[2]v/amazon-linux-2-gpu/%[1]v/image_id" + AL2_ARM_64 = "/aws/service/eks/optimized-ami/%[2]v/amazon-linux-2-arm64/%[1]v/image_id" + AL2023_x86_64_STANDARD = "/aws/service/eks/optimized-ami/%[2]v/amazon-linux-2023/x86_64/standard/%[1]v/image_id" + AL2023_ARM_64_STANDARD = "/aws/service/eks/optimized-ami/%[2]v/amazon-linux-2023/arm64/standard/%[1]v/image_id" + BOTTLEROCKET_x86_64 = "/aws/service/bottlerocket/aws-k8s-%[2]v/x86_64/%[1]v/image_id" + BOTTLEROCKET_ARM_64 = "/aws/service/bottlerocket/aws-k8s-%[2]v/arm64/%[1]v/image_id" + BOTTLEROCKET_x86_64_NVIDIA = "/aws/service/bottlerocket/aws-k8s-%[2]v-nvidia/x86_64/%[1]v/image_id" + BOTTLEROCKET_ARM_64_NVIDIA = "/aws/service/bottlerocket/aws-k8s-%[2]v-nvidia/arm64/%[1]v/image_id" + WINDOWS_CORE_2019_x86_64 = "/aws/service/ami-windows-latest/Windows_Server-2019-English-Core-EKS_Optimized-%[2]v/image_id" + WINDOWS_FULL_2019_x86_64 = "/aws/service/ami-windows-latest/Windows_Server-2019-English-Full-EKS_Optimized-%[2]v/image_id" + WINDOWS_CORE_2022_x86_64 = "/aws/service/ami-windows-latest/Windows_Server-2022-English-Core-EKS_Optimized-%[2]v/image_id" + WINDOWS_FULL_2022_x86_64 = "/aws/service/ami-windows-latest/Windows_Server-2022-English-Full-EKS_Optimized-%[2]v/image_id" + } + + # AMI specifiers + # AL2 + # AMI name: amazon-eks-node-1.29-v20240117 + # AMI SSM param: /aws/service/eks/optimized-ami/1.29/amazon-linux-2/amazon-eks-node-1.29-v20240117/image_id + # AL2023 + # AMI name: amazon-eks-node-al2023-arm64-standard-1.29-v20240605 + # AMI SSM param: /aws/service/eks/optimized-ami/1.29/amazon-linux-2023/x86_64/standard/amazon-eks-node-al2023-x86_64-standard-1.29-v20240605/image_id + # Bottlerocket: + # AMI name: bottlerocket-aws-k8s-1.24-nvidia-x86_64-v1.20.1-7c3e9198 + # AMI SSM param: bottlerocket/aws-k8s-1.24-nvidia/x86_64/1.20.1-7c3e9198/image_id # No "v" + ami_specifier = var.ami_specifier == "recommended" && startswith(var.ami_type, "BOTTLEROCKET") ? "latest" : var.ami_specifier - # if ami_release_version is provided - ami_version_regex = local.need_ami_id ? { - # if ami_release_version = "1.21-20211013" - # insert the letter v prior to the ami_version so it becomes 1.21-v20211013 - # if not, use the kubernetes version - AL2 = (length(var.ami_release_version) == 1 ? replace(var.ami_release_version[0], "/^(\\d+\\.\\d+)\\.\\d+-(\\d+)$/", "$1-v$2") : "${local.ami_kubernetes_version}-*"), - # if ami_release_version = "1.2.0-ccf1b754" - # prefix the ami release version with the letter v - # if not, use an asterisk - BOTTLEROCKET = (length(var.ami_release_version) == 1 ? format("v%s", var.ami_release_version[0]) : "*"), - WINDOWS_CORE_2019 = (length(var.ami_release_version) == 1 ? format("%s", var.ami_release_version[0]) : "*"), - WINDOWS_FULL_2019 = (length(var.ami_release_version) == 1 ? format("%s", var.ami_release_version[0]) : "*"), - WINDOWS_CORE_2022 = (length(var.ami_release_version) == 1 ? format("%s", var.ami_release_version[0]) : "*"), - WINDOWS_FULL_2022 = (length(var.ami_release_version) == 1 ? format("%s", var.ami_release_version[0]) : "*"), - } : {} + # Kubernetes version priority (first one to be set wins) + # 1. var.kubernetes_version + # 2. data.eks_cluster.this.kubernetes_version + use_cluster_kubernetes_version = local.enabled ? local.need_ami_id && length(var.kubernetes_version) == 0 : false + need_cluster_kubernetes_version = local.use_cluster_kubernetes_version - ami_regex = local.need_ami_id ? { - AL2 = format(local.ami_format["AL2"], local.arch_label_map[var.ami_type], local.ami_version_regex[local.ami_kind]), - BOTTLEROCKET = format(local.ami_format["BOTTLEROCKET"], local.ami_kubernetes_version, local.arch_label_map[var.ami_type], local.ami_version_regex[local.ami_kind]), - WINDOWS_CORE_2019 = format(local.ami_format["WINDOWS_CORE_2019"], local.ami_kubernetes_version, local.ami_version_regex[local.ami_kind]), - WINDOWS_FULL_2019 = format(local.ami_format["WINDOWS_FULL_2019"], local.ami_kubernetes_version, local.ami_version_regex[local.ami_kind]), - WINDOWS_CORE_2022 = format(local.ami_format["WINDOWS_CORE_2022"], local.ami_kubernetes_version, local.ami_version_regex[local.ami_kind]), - WINDOWS_FULL_2022 = format(local.ami_format["WINDOWS_FULL_2022"], local.ami_kubernetes_version, local.ami_version_regex[local.ami_kind]), - } : {} + ami_kubernetes_version = local.use_cluster_kubernetes_version ? data.aws_eks_cluster.this[0].version : var.kubernetes_version[0] } -data "aws_ami" "selected" { - count = local.enabled && local.need_ami_id ? 1 : 0 +data "aws_ssm_parameter" "ami_id" { + count = 1 # local.enabled && local.need_ami_id ? 1 : 0 - most_recent = true - name_regex = local.ami_regex[local.ami_kind] + name = format(local.ami_ssm_format[var.ami_type], local.ami_specifier, local.ami_kubernetes_version) +} - owners = ["amazon"] +output "ami_ids" { + value = { + for key, value in data.aws_ssm_parameter.ami_id : key => value.insecure_value + } } diff --git a/docs/terraform.md b/docs/terraform.md index d55de56..6ea9d1e 100644 --- a/docs/terraform.md +++ b/docs/terraform.md @@ -4,14 +4,14 @@ | Name | Version | |------|---------| | [terraform](#requirement\_terraform) | >= 1.3.0 | -| [aws](#requirement\_aws) | >= 4.48 | +| [aws](#requirement\_aws) | >= 5.8 | | [random](#requirement\_random) | >= 2.0 | ## Providers | Name | Version | |------|---------| -| [aws](#provider\_aws) | >= 4.48 | +| [aws](#provider\_aws) | >= 5.8 | | [random](#provider\_random) | >= 2.0 | ## Modules @@ -36,12 +36,12 @@ | [aws_iam_role_policy_attachment.ipv6_eks_cni_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy_attachment) | resource | | [aws_launch_template.default](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/launch_template) | resource | | [random_pet.cbd](https://registry.terraform.io/providers/hashicorp/random/latest/docs/resources/pet) | resource | -| [aws_ami.selected](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/ami) | data source | | [aws_eks_cluster.this](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/eks_cluster) | data source | | [aws_iam_policy_document.assume_role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source | | [aws_iam_policy_document.ipv6_eks_cni_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source | | [aws_launch_template.this](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/launch_template) | data source | | [aws_partition.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/partition) | data source | +| [aws_ssm_parameter.ami_id](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/ssm_parameter) | data source | ## Inputs @@ -49,18 +49,19 @@ |------|-------------|------|---------|:--------:| | [additional\_tag\_map](#input\_additional\_tag\_map) | Additional key-value pairs to add to each map in `tags_as_list_of_maps`. Not added to `tags` or `id`.
This is for some rare cases where resources want additional configuration of tags
and therefore take a list of maps with tag key, value, and additional configuration. | `map(string)` | `{}` | no | | [after\_cluster\_joining\_userdata](#input\_after\_cluster\_joining\_userdata) | Additional `bash` commands to execute on each worker node after joining the EKS cluster (after executing the `bootstrap.sh` script). For more info, see https://kubedex.com/90-days-of-aws-eks-in-production | `list(string)` | `[]` | no | -| [ami\_image\_id](#input\_ami\_image\_id) | AMI to use. Ignored if `launch_template_id` is supplied. | `list(string)` | `[]` | no | -| [ami\_release\_version](#input\_ami\_release\_version) | EKS AMI version to use, e.g. For AL2 "1.16.13-20200821" or for bottlerocket "1.2.0-ccf1b754" (no "v") or for Windows "2023.02.14". For AL2, bottlerocket and Windows, it defaults to latest version for Kubernetes version. | `list(string)` | `[]` | no | -| [ami\_type](#input\_ami\_type) | Type of Amazon Machine Image (AMI) associated with the EKS Node Group.
Defaults to `AL2_x86_64`. Valid values: `AL2_x86_64, AL2_x86_64_GPU, AL2_ARM_64, CUSTOM, BOTTLEROCKET_ARM_64, BOTTLEROCKET_x86_64, BOTTLEROCKET_ARM_64_NVIDIA, BOTTLEROCKET_x86_64_NVIDIA, WINDOWS_CORE_2019_x86_64, WINDOWS_FULL_2019_x86_64, WINDOWS_CORE_2022_x86_64, WINDOWS_FULL_2022_x86_64`. | `string` | `"AL2_x86_64"` | no | +| [ami\_image\_id](#input\_ami\_image\_id) | AMI to use, overriding other AMI specifications, but must match `ami_type`. Ignored if `launch_template_id` is supplied. | `list(string)` | `[]` | no | +| [ami\_release\_version](#input\_ami\_release\_version) | OBSOLETE: Use `ami_specifier` instead. Note that it has a different format.
Historical description: EKS AMI version to use, e.g. For AL2 \"1.16.13-20200821\" or for bottlerocket \"1.2.0-ccf1b754\" (no \"v\") or for Windows \"2023.02.14\". For AL2, bottlerocket and Windows, it defaults to latest version for Kubernetes version." | `list(string)` | `[]` | no | +| [ami\_specifier](#input\_ami\_specifier) | OS-dependent specifier for one of the several AMIs that match OS, architecture, and Kubernetes 1.xx version.
If not specified the recommended/latest AMI for the given Kubernetes version will be used.
Examples:
AL2: amazon-eks-node-1.29-v20240117
AL2023: amazon-eks-node-al2023-x86\_64-standard-1.29-v20240605
Bottlerocket: 1.20.1-7c3e9198
Windows: | `string` | `"recommended"` | no | +| [ami\_type](#input\_ami\_type) | Type of Amazon Machine Image (AMI) associated with the EKS Node Group.
Defaults to `AL2_x86_64`. Valid values: `AL2_x86_64, AL2_x86_64_GPU, AL2_ARM_64, CUSTOM, BOTTLEROCKET_ARM_64, BOTTLEROCKET_x86_64, BOTTLEROCKET_ARM_64_NVIDIA, BOTTLEROCKET_x86_64_NVIDIA, WINDOWS_CORE_2019_x86_64, WINDOWS_FULL_2019_x86_64, WINDOWS_CORE_2022_x86_64, WINDOWS_FULL_2022_x86_64, AL2023_x86_64_STANDARD, AL2023_ARM_64_STANDARD`. | `string` | `"AL2_x86_64"` | no | | [associate\_cluster\_security\_group](#input\_associate\_cluster\_security\_group) | When true, associate the default cluster security group to the nodes. If disabled the EKS managed security group will not
be associated to the nodes and you will need to provide another security group that allows the nodes to communicate with
the EKS control plane. Be aware that if no `associated_security_group_ids` or `ssh_access_security_group_ids` are provided,
then the nodes will have no inbound or outbound rules. | `bool` | `true` | no | | [associated\_security\_group\_ids](#input\_associated\_security\_group\_ids) | A list of IDs of Security Groups to associate the node group with, in addition to the EKS' created security group.
These security groups will not be modified. | `list(string)` | `[]` | no | | [attributes](#input\_attributes) | ID element. Additional attributes (e.g. `workers` or `cluster`) to add to `id`,
in the order they appear in the list. New attributes are appended to the
end of the list. The elements of the list are joined by the `delimiter`
and treated as a single ID element. | `list(string)` | `[]` | no | | [before\_cluster\_joining\_userdata](#input\_before\_cluster\_joining\_userdata) | Additional `bash` commands to execute on each worker node before joining the EKS cluster (before executing the `bootstrap.sh` script). For more info, see https://kubedex.com/90-days-of-aws-eks-in-production | `list(string)` | `[]` | no | | [block\_device\_map](#input\_block\_device\_map) | Map of block device name specification, see [launch\_template.block-devices](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/launch_template#block-devices). |
map(object({
no_device = optional(bool, null)
virtual_name = optional(string, null)
ebs = optional(object({
delete_on_termination = optional(bool, true)
encrypted = optional(bool, true)
iops = optional(number, null)
kms_key_id = optional(string, null)
snapshot_id = optional(string, null)
throughput = optional(number, null)
volume_size = optional(number, 20)
volume_type = optional(string, "gp3")
}))
}))
|
{
"/dev/xvda": {
"ebs": {}
}
}
| no | | [block\_device\_mappings](#input\_block\_device\_mappings) | DEPRECATED: Use `block_device_map` instead.
List of block device mappings for the launch template.
Each list element is an object with a `device_name` key and
any keys supported by the `ebs` block of `launch_template`. | `list(any)` | `null` | no | -| [bootstrap\_additional\_options](#input\_bootstrap\_additional\_options) | Additional options to bootstrap.sh. DO NOT include `--kubelet-additional-args`, use `kubelet_additional_options` var instead. | `list(string)` | `[]` | no | +| [bootstrap\_additional\_options](#input\_bootstrap\_additional\_options) | Additional options to bootstrap.sh. DO NOT include `--kubelet-additional-args`, use `kubelet_additional_options` var instead. Not used with AL2023 AMI types. | `list(string)` | `[]` | no | | [capacity\_type](#input\_capacity\_type) | Type of capacity associated with the EKS Node Group. Valid values: "ON\_DEMAND", "SPOT", or `null`.
Terraform will only perform drift detection if a configuration value is provided. | `string` | `null` | no | -| [cluster\_autoscaler\_enabled](#input\_cluster\_autoscaler\_enabled) | Set `true` to label the node group so that the [Kubernetes Cluster Autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md#auto-discovery-setup) will discover and autoscale it.
Note that even when `false`, EKS will set the `k8s.io/cluster-autoscaler/enabled` label to `true` on the node group. | `bool` | `false` | no | +| [cluster\_autoscaler\_enabled](#input\_cluster\_autoscaler\_enabled) | OBSOLETE. Used to add support for the Kubernetes Cluster Autoscaler, but additional support is no longer needed. | `bool` | `null` | no | | [cluster\_name](#input\_cluster\_name) | The name of the EKS cluster | `string` | n/a | yes | | [context](#input\_context) | Single object for setting entire context at once.
See description of individual variables for details.
Leave string and numeric variables as `null` to use default value.
Individual variable settings (non-null) override settings in context object,
except for attributes, tags, and additional\_tag\_map, which are merged. | `any` |
{
"additional_tag_map": {},
"attributes": [],
"delimiter": null,
"descriptor_formats": {},
"enabled": true,
"environment": null,
"id_length_limit": null,
"label_key_case": null,
"label_order": [],
"label_value_case": null,
"labels_as_tags": [
"unset"
],
"name": null,
"namespace": null,
"regex_replace_chars": null,
"stage": null,
"tags": {},
"tenant": null
}
| no | | [cpu\_options](#input\_cpu\_options) | Configuration for the [`cpu_options` Configuration Block](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/launch_template#cpu_options) of the launch template.
Leave list empty for defaults. Pass list with single object with attributes matching the `cpu_options` block to configure it.
Note that this configures the launch template only. Some elements will be ignored by the Auto Scaling Group
that actually launches instances. Consult AWS documentation for details. | `list(any)` | `[]` | no | @@ -79,7 +80,7 @@ | [instance\_types](#input\_instance\_types) | Instance types to use for this node group (up to 20). Defaults to ["t3.medium"].
Must be empty if the launch template configured by `launch_template_id` specifies an instance type. | `list(string)` |
[
"t3.medium"
]
| no | | [kubelet\_additional\_options](#input\_kubelet\_additional\_options) | Additional flags to pass to kubelet.
DO NOT include `--node-labels` or `--node-taints`,
use `kubernetes_labels` and `kubernetes_taints` to specify those." | `list(string)` | `[]` | no | | [kubernetes\_labels](#input\_kubernetes\_labels) | Key-value mapping of Kubernetes labels. Only labels that are applied with the EKS API are managed by this argument.
Other Kubernetes labels applied to the EKS Node Group will not be managed. | `map(string)` | `{}` | no | -| [kubernetes\_taints](#input\_kubernetes\_taints) | List of `key`, `value`, `effect` objects representing Kubernetes taints.
`effect` must be one of `NO_SCHEDULE`, `NO_EXECUTE`, or `PREFER_NO_SCHEDULE`.
`key` and `effect` are required, `value` may be null. |
list(object({
key = string
value = string
effect = string
}))
| `[]` | no | +| [kubernetes\_taints](#input\_kubernetes\_taints) | List of `key`, `value`, `effect` objects representing Kubernetes taints.
`effect` must be one of `NO_SCHEDULE`, `NO_EXECUTE`, or `PREFER_NO_SCHEDULE`.
`key` and `effect` are required, `value` may be null. |
list(object({
key = string
value = optional(string)
effect = string
}))
| `[]` | no | | [kubernetes\_version](#input\_kubernetes\_version) | Kubernetes version. Defaults to EKS Cluster Kubernetes version. Terraform will only perform drift detection if a configuration value is provided | `list(string)` | `[]` | no | | [label\_key\_case](#input\_label\_key\_case) | Controls the letter case of the `tags` keys (label names) for tags generated by this module.
Does not affect keys of tags passed in via the `tags` input.
Possible values: `lower`, `title`, `upper`.
Default value: `title`. | `string` | `null` | no | | [label\_order](#input\_label\_order) | The order in which the labels (ID elements) appear in the `id`.
Defaults to ["namespace", "environment", "stage", "name", "attributes"].
You can omit any of the 6 labels ("tenant" is the 6th), but at least one must be present. | `list(string)` | `null` | no | @@ -95,7 +96,7 @@ | [module\_depends\_on](#input\_module\_depends\_on) | Can be any value desired. Module will wait for this value to be computed before creating node group. | `any` | `null` | no | | [name](#input\_name) | ID element. Usually the component or solution name, e.g. 'app' or 'jenkins'.
This is the only ID element not also included as a `tag`.
The "name" tag is set to the full `id` string. There is no tag with the value of the `name` input. | `string` | `null` | no | | [namespace](#input\_namespace) | ID element. Usually an abbreviation of your organization name, e.g. 'eg' or 'cp', to help ensure generated IDs are globally unique | `string` | `null` | no | -| [node\_group\_terraform\_timeouts](#input\_node\_group\_terraform\_timeouts) | Configuration for the Terraform [`timeouts` Configuration Block](https://www.terraform.io/docs/language/resources/syntax.html#operation-timeouts) of the node group resource.
Leave list empty for defaults. Pass list with single object with attributes matching the `timeouts` block to configure it.
Leave attribute values `null` to preserve individual defaults while setting others. |
list(object({
create = string
update = string
delete = string
}))
| `[]` | no | +| [node\_group\_terraform\_timeouts](#input\_node\_group\_terraform\_timeouts) | Configuration for the Terraform [`timeouts` Configuration Block](https://www.terraform.io/docs/language/resources/syntax.html#operation-timeouts) of the node group resource.
Leave list empty for defaults. Pass list with single object with attributes matching the `timeouts` block to configure it.
Leave attribute values `null` to preserve individual defaults while setting others. |
list(object({
create = optional(string)
update = optional(string)
delete = optional(string)
}))
| `[]` | no | | [node\_role\_arn](#input\_node\_role\_arn) | If provided, assign workers the given role, which this module will not modify | `list(string)` | `[]` | no | | [node\_role\_cni\_policy\_enabled](#input\_node\_role\_cni\_policy\_enabled) | When true, the `AmazonEKS_CNI_Policy` will be attached to the node IAM role.
This used to be required, but it is [now recommended](https://docs.aws.amazon.com/eks/latest/userguide/create-node-role.html) that this policy be
attached only to the `aws-node` Kubernetes service account. However, that
is difficult to do with Terraform, so this module defaults to the old pattern. | `bool` | `true` | no | | [node\_role\_permissions\_boundary](#input\_node\_role\_permissions\_boundary) | If provided, all IAM roles will be created with this permissions boundary attached. | `string` | `null` | no | @@ -110,12 +111,15 @@ | [tags](#input\_tags) | Additional tags (e.g. `{'BusinessUnit': 'XYZ'}`).
Neither the tag keys nor the tag values will be modified by this module. | `map(string)` | `{}` | no | | [tenant](#input\_tenant) | ID element \_(Rarely used, not included by default)\_. A customer identifier, indicating who this instance of a resource is for | `string` | `null` | no | | [update\_config](#input\_update\_config) | Configuration for the `eks_node_group` [`update_config` Configuration Block](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_node_group#update_config-configuration-block).
Specify exactly one of `max_unavailable` (node count) or `max_unavailable_percentage` (percentage of nodes). | `list(map(number))` | `[]` | no | -| [userdata\_override\_base64](#input\_userdata\_override\_base64) | Many features of this module rely on the `bootstrap.sh` provided with Amazon Linux, and this module
may generate "user data" that expects to find that script. If you want to use an AMI that is not
compatible with the Amazon Linux `bootstrap.sh` initialization, then use `userdata_override_base64` to provide
your own (Base64 encoded) user data. Use "" to prevent any user data from being set.

Setting `userdata_override_base64` disables `kubernetes_taints`, `kubelet_additional_options`,
`before_cluster_joining_userdata`, `after_cluster_joining_userdata`, and `bootstrap_additional_options`. | `list(string)` | `[]` | no | +| [userdata\_override\_base64](#input\_userdata\_override\_base64) | Many features of this module rely on the `bootstrap.sh` provided with Amazon Linux, and this module
may generate "user data" that expects to find that script. If you want to use an AMI that is not
compatible with the userdata generated by this module, then use `userdata_override_base64` to provide
your own (Base64 encoded) user data. Use "" to prevent any user data from being set.

Setting `userdata_override_base64` disables `kubernetes_taints`, `kubelet_additional_options`,
`before_cluster_joining_userdata`, `after_cluster_joining_userdata`, and `bootstrap_additional_options`. | `list(string)` | `[]` | no | ## Outputs | Name | Description | |------|-------------| +| [WARNING\_cluster\_autoscaler\_enabled](#output\_WARNING\_cluster\_autoscaler\_enabled) | n/a | +| [ami\_ids](#output\_ami\_ids) | n/a | +| [eks\_node\_group\_ami\_id](#output\_eks\_node\_group\_ami\_id) | The ID of the AMI used for the worker nodes, if specified | | [eks\_node\_group\_arn](#output\_eks\_node\_group\_arn) | Amazon Resource Name (ARN) of the EKS Node Group | | [eks\_node\_group\_cbd\_pet\_name](#output\_eks\_node\_group\_cbd\_pet\_name) | The pet name of this node group, if this module generated one | | [eks\_node\_group\_id](#output\_eks\_node\_group\_id) | EKS Cluster name and EKS Node Group name separated by a colon | @@ -127,5 +131,4 @@ | [eks\_node\_group\_role\_name](#output\_eks\_node\_group\_role\_name) | Name of the worker nodes IAM role | | [eks\_node\_group\_status](#output\_eks\_node\_group\_status) | Status of the EKS Node Group | | [eks\_node\_group\_tags\_all](#output\_eks\_node\_group\_tags\_all) | A map of tags assigned to the resource, including those inherited from the provider default\_tags configuration block. | -| [eks\_node\_group\_windows\_note](#output\_eks\_node\_group\_windows\_note) | Instructions on changes a user needs to follow or script for a windows node group in the event of a custom ami | diff --git a/examples/complete/fixtures.us-east-2.tfvars b/examples/complete/fixtures.us-east-2.tfvars index 673c0b5..ea68471 100644 --- a/examples/complete/fixtures.us-east-2.tfvars +++ b/examples/complete/fixtures.us-east-2.tfvars @@ -30,15 +30,21 @@ kubernetes_labels = { terratest = "true" } -before_cluster_joining_userdata = <<-EOT - printf "\n\n###\nExample output from before_cluster_joining_userdata\n###\n\n" - EOT +before_cluster_joining_userdata = [ + "echo 1", + "echo 2", + "echo \"###\"", + "printf \"Example output from before_cluster_joining_userdata\n###\n\n\"", +] + +kubelet_additional_options = ["--kube-reserved cpu=100m,memory=600Mi,ephemeral-storage=1Gi --system-reserved cpu=100m,memory=200Mi,ephemeral-storage=1Gi --eviction-hard memory.available<200Mi,nodefs.available<10%,imagefs.available<15%"] update_config = [{ max_unavailable = 2 }] kubernetes_taints = [ { key = "test" - value = null effect = "PREFER_NO_SCHEDULE" }] + +ami_type = "AL2023_x86_64_STANDARD" diff --git a/examples/complete/main.tf b/examples/complete/main.tf index 76443b4..a92e495 100644 --- a/examples/complete/main.tf +++ b/examples/complete/main.tf @@ -121,7 +121,7 @@ module "https_sg" { module "eks_cluster" { source = "cloudposse/eks-cluster/aws" - version = "4.1.0" + version = "4.2.0" region = var.region subnet_ids = module.subnets.public_subnet_ids kubernetes_version = var.kubernetes_version @@ -169,14 +169,12 @@ module "eks_node_group" { after_cluster_joining_userdata = var.after_cluster_joining_userdata - ami_type = var.ami_type - ami_release_version = var.ami_release_version + ami_type = var.ami_type + ami_specifier = var.ami_specifier - before_cluster_joining_userdata = [var.before_cluster_joining_userdata] + before_cluster_joining_userdata = var.before_cluster_joining_userdata - # Ensure ordering of resource creation to eliminate the race conditions when applying the Kubernetes Auth ConfigMap. - # Do not create Node Group before the EKS cluster is created and the `aws-auth` Kubernetes ConfigMap is applied. - depends_on = [module.eks_cluster, module.eks_cluster.kubernetes_config_map_id] + kubelet_additional_options = var.kubelet_additional_options create_before_destroy = true @@ -184,8 +182,7 @@ module "eks_node_group" { replace_node_group_on_version_update = var.replace_node_group_on_version_update node_group_terraform_timeouts = [{ - create = "40m" - update = null + create = "25m" delete = "20m" }] diff --git a/examples/complete/outputs.tf b/examples/complete/outputs.tf index 06a7716..dddd973 100644 --- a/examples/complete/outputs.tf +++ b/examples/complete/outputs.tf @@ -81,3 +81,8 @@ output "eks_node_group_cbd_pet_name" { output "eks_node_group_launch_template_id" { value = module.eks_node_group.eks_node_group_launch_template_id } + +output "eks_node_group_ami_id" { + description = "The ID of the AMI used for the worker nodes, if specified" + value = module.eks_node_group.eks_node_group_ami_id +} diff --git a/examples/complete/variables.tf b/examples/complete/variables.tf index acc1d3f..ff4fb66 100644 --- a/examples/complete/variables.tf +++ b/examples/complete/variables.tf @@ -67,11 +67,7 @@ variable "kubernetes_labels" { } variable "kubernetes_taints" { - type = list(object({ - key = string - value = string - effect = string - })) + type = list(any) description = <<-EOT List of `key`, `value`, `effect` objects representing Kubernetes taints. `effect` must be one of `NO_SCHEDULE`, `NO_EXECUTE`, or `PREFER_NO_SCHEDULE`. @@ -80,6 +76,11 @@ variable "kubernetes_taints" { default = [] } +variable "kubelet_additional_options" { + type = list(string) + description = "Command-line flags to pass to kubelet" +} + variable "desired_size" { type = number description = "Desired number of worker nodes" @@ -96,8 +97,8 @@ variable "min_size" { } variable "before_cluster_joining_userdata" { - type = string - default = "" + type = list(string) + default = [] description = "Additional commands to execute on each worker node before joining the EKS cluster (before executing the `bootstrap.sh` script). For more info, see https://kubedex.com/90-days-of-aws-eks-in-production" } @@ -117,27 +118,30 @@ variable "ami_type" { type = string description = <<-EOT Type of Amazon Machine Image (AMI) associated with the EKS Node Group. - Defaults to `AL2_x86_64`. Valid values: `AL2_x86_64, AL2_x86_64_GPU, AL2_ARM_64, CUSTOM, BOTTLEROCKET_ARM_64, BOTTLEROCKET_x86_64, BOTTLEROCKET_ARM_64_NVIDIA, BOTTLEROCKET_x86_64_NVIDIA, WINDOWS_CORE_2019_x86_64, WINDOWS_FULL_2019_x86_64, WINDOWS_CORE_2022_x86_64, WINDOWS_FULL_2022_x86_64`. + Defaults to `AL2_x86_64`. Valid values: `AL2_x86_64, AL2_x86_64_GPU, AL2_ARM_64, CUSTOM, BOTTLEROCKET_ARM_64, BOTTLEROCKET_x86_64, BOTTLEROCKET_ARM_64_NVIDIA, BOTTLEROCKET_x86_64_NVIDIA, WINDOWS_CORE_2019_x86_64, WINDOWS_FULL_2019_x86_64, WINDOWS_CORE_2022_x86_64, WINDOWS_FULL_2022_x86_64, AL2023_x86_64_STANDARD, AL2023_ARM_64_STANDARD`. EOT default = "AL2_x86_64" validation { condition = ( - contains(["AL2_x86_64", "AL2_x86_64_GPU", "AL2_ARM_64", "CUSTOM", "BOTTLEROCKET_ARM_64", "BOTTLEROCKET_x86_64", "BOTTLEROCKET_ARM_64_NVIDIA", "BOTTLEROCKET_x86_64_NVIDIA", "WINDOWS_CORE_2019_x86_64", "WINDOWS_FULL_2019_x86_64", "WINDOWS_CORE_2022_x86_64", "WINDOWS_FULL_2022_x86_64"], var.ami_type) + contains(["AL2_x86_64", "AL2_x86_64_GPU", "AL2_ARM_64", "CUSTOM", "BOTTLEROCKET_ARM_64", "BOTTLEROCKET_x86_64", "BOTTLEROCKET_ARM_64_NVIDIA", "BOTTLEROCKET_x86_64_NVIDIA", "WINDOWS_CORE_2019_x86_64", "WINDOWS_FULL_2019_x86_64", "WINDOWS_CORE_2022_x86_64", "WINDOWS_FULL_2022_x86_64", "AL2023_x86_64_STANDARD", "AL2023_ARM_64_STANDARD"], var.ami_type) ) - error_message = "Var ami_type must be one of \"AL2_x86_64\",\"AL2_x86_64_GPU\",\"AL2_ARM_64\",\"BOTTLEROCKET_ARM_64\",\"BOTTLEROCKET_x86_64\",\"BOTTLEROCKET_ARM_64_NVIDIA\",\"BOTTLEROCKET_x86_64_NVIDIA\",\"WINDOWS_CORE_2019_x86_64\",\"WINDOWS_FULL_2019_x86_64\",\"WINDOWS_CORE_2022_x86_64\",\"WINDOWS_FULL_2022_x86_64\", or \"CUSTOM\"." + error_message = "Var ami_type must be one of \"AL2_x86_64\",\"AL2_x86_64_GPU\",\"AL2_ARM_64\",\"BOTTLEROCKET_ARM_64\",\"BOTTLEROCKET_x86_64\",\"BOTTLEROCKET_ARM_64_NVIDIA\",\"BOTTLEROCKET_x86_64_NVIDIA\",\"WINDOWS_CORE_2019_x86_64\",\"WINDOWS_FULL_2019_x86_64\",\"WINDOWS_CORE_2022_x86_64\",\"WINDOWS_FULL_2022_x86_64\", \"AL2023_x86_64_STANDARD\", \"AL2023_ARM_64_STANDARD\", or \"CUSTOM\"." } } -variable "ami_release_version" { - type = list(string) - default = [] - description = "EKS AMI version to use, e.g. \"1.16.13-20200821\" (no \"v\"). Defaults to latest version for Kubernetes version." - validation { - condition = ( - length(var.ami_release_version) == 0 ? true : length(regexall("^\\d+\\.\\d+\\.\\d+-[\\da-z]+$", var.ami_release_version[0])) == 1 - ) - error_message = "Var ami_release_version, if supplied, must be like \"1.16.13-20200821\" (no \"v\")." - } +variable "ami_specifier" { + type = string + description = <<-EOT + OS-dependent specifier for one of the several AMIs that match OS, architecture, and Kubernetes 1.xx version. + If not specified the recommended/latest AMI for the given Kubernetes version will be used. + Examples: + AL2: amazon-eks-node-1.29-v20240117 + AL2023: amazon-eks-node-al2023-x86_64-standard-1.29-v20240605 + Bottlerocket: 1.20.1-7c3e9198 + Windows: + EOT + default = "recommended" + nullable = false } variable "after_cluster_joining_userdata" { diff --git a/examples/complete/versions.tf b/examples/complete/versions.tf index c3b7cb3..aeb1ff8 100644 --- a/examples/complete/versions.tf +++ b/examples/complete/versions.tf @@ -3,24 +3,16 @@ terraform { required_providers { aws = { + # retrieve launch template by ID starts at 3.21.0 + # update_config starts at 3.56 + # Windows support starts at 4.48 https://github.com/hashicorp/terraform-provider-aws/blob/main/CHANGELOG.md#4480-december-19-2022 + # SSM parameter `insecure_value` starts at 5.8 source = "hashicorp/aws" - version = ">= 5.34" - } - template = { - source = "cloudposse/template" - version = ">= 2.2" - } - local = { - source = "hashicorp/local" - version = ">= 1.3" + version = ">= 5.8" } random = { source = "hashicorp/random" version = ">= 2.0" } - null = { - source = "hashicorp/null" - version = ">= 2.0" - } } } diff --git a/launch-template.tf b/launch-template.tf index d30e1b2..8b459b4 100644 --- a/launch-template.tf +++ b/launch-template.tf @@ -31,7 +31,7 @@ locals { local.fetch_launch_template ? data.aws_launch_template.this[0].latest_version : aws_launch_template.default[0].latest_version )) : null - launch_template_ami = length(var.ami_image_id) == 0 ? (local.features_require_ami ? data.aws_ami.selected[0].image_id : "") : var.ami_image_id[0] + launch_template_ami = length(var.ami_image_id) == 0 ? (local.features_require_ami ? data.aws_ssm_parameter.ami_id[0].insecure_value : "") : var.ami_image_id[0] associate_cluster_security_group = local.enabled && var.associate_cluster_security_group launch_template_vpc_security_group_ids = sort(compact(concat( @@ -148,6 +148,16 @@ resource "aws_launch_template" "default" { enabled = var.detailed_monitoring_enabled } + lifecycle { + precondition { + condition = length(local.userdata_vars.bootstrap_extra_args) == 0 || local.ami_os != "AL2023" + error_message = "The input `bootstrap_additional_options` is not supported for AL2023." + } + precondition { + condition = length(local.userdata_vars.after_cluster_joining_userdata) == 0 || local.ami_os != "AL2023" + error_message = "The input `after_cluster_joining_userdata` is not supported for AL2023." + } + } } data "aws_launch_template" "this" { diff --git a/main.tf b/main.tf index 4edc93d..c6cad07 100644 --- a/main.tf +++ b/main.tf @@ -11,23 +11,30 @@ locals { need_ssh_access_sg = local.enabled && (local.have_ssh_key || length(var.ssh_access_security_group_ids) > 0) && local.generate_launch_template - get_cluster_data = local.enabled ? (local.need_cluster_kubernetes_version || local.need_bootstrap || local.need_ssh_access_sg || length(var.associated_security_group_ids) > 0) : false + get_cluster_data = local.enabled ? ( + local.need_cluster_kubernetes_version || + local.need_bootstrap || + local.need_ssh_access_sg || + length(var.associated_security_group_ids) > 0 || + (length(local.kubelet_extra_args) > 0 && local.ami_os == "AL2023") + ) : false - taint_effect_map = { - NO_SCHEDULE = "NoSchedule" - NO_EXECUTE = "NoExecute" - PREFER_NO_SCHEDULE = "PreferNoSchedule" - } # At the moment, the autoscaler tags are not needed. # We leave them here for when they can be applied to the autoscaling group. - - autoscaler_enabled = var.cluster_autoscaler_enabled + /* # # Set up tags for autoscaler and other resources # https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md#auto-discovery-setup # + + taint_effect_map = { + NO_SCHEDULE = "NoSchedule" + NO_EXECUTE = "NoExecute" + PREFER_NO_SCHEDULE = "PreferNoSchedule" + } + autoscaler_enabled_tags = { "k8s.io/cluster-autoscaler/${var.cluster_name}" = "owned" "k8s.io/cluster-autoscaler/enabled" = "true" @@ -39,7 +46,7 @@ locals { for taint in var.kubernetes_taints : format("k8s.io/cluster-autoscaler/node-template/taint/%v", taint.key) => "${taint.value == null ? "" : taint.value}:${local.taint_effect_map[taint.effect]}" } - autoscaler_tags = merge(local.autoscaler_enabled_tags, local.autoscaler_kubernetes_label_tags, local.autoscaler_kubernetes_taints_tags) + */ node_tags = merge( module.label.tags, @@ -53,7 +60,7 @@ locals { # TODO: # Replace: node_group_tags = merge(local.node_tags, local.autoscaler_enabled ? local.autoscaler_tags : null) # with: node_group_tags = local.node_tags - node_group_tags = merge(local.node_tags, local.autoscaler_enabled ? local.autoscaler_tags : null) + node_group_tags = local.node_tags } module "label" { @@ -72,7 +79,6 @@ data "aws_eks_cluster" "this" { # Support keeping 2 node groups in sync by extracting common variable settings locals { - is_windows = can(regex("WINDOWS", var.ami_type)) ng = { cluster_name = var.cluster_name node_role_arn = local.create_role ? join("", aws_iam_role.default[*].arn) : try(var.node_role_arn[0], null) diff --git a/outputs.tf b/outputs.tf index 242ee1c..d16d764 100644 --- a/outputs.tf +++ b/outputs.tf @@ -53,12 +53,7 @@ output "eks_node_group_tags_all" { value = local.enabled ? (var.create_before_destroy ? aws_eks_node_group.cbd[0].tags_all : aws_eks_node_group.default[0].tags_all) : {} } -output "eks_node_group_windows_note" { - description = "Instructions on changes a user needs to follow or script for a windows node group in the event of a custom ami" - value = (local.enabled && local.is_windows && local.need_bootstrap ? <<-EOT - When specifying a custom AMI ID for Windows managed node groups, - add eks:kube-proxy-windows to your AWS IAM Authenticator configuration map. - For more information, see [Limits and conditions when specifying an AMI ID](https://docs.aws.amazon.com/eks/latest/userguide/windows-support.html) - EOT - : null) +output "eks_node_group_ami_id" { + description = "The ID of the AMI used for the worker nodes, if specified" + value = local.launch_template_ami } diff --git a/userdata.tf b/userdata.tf index 0a3a076..d76550e 100644 --- a/userdata.tf +++ b/userdata.tf @@ -1,13 +1,27 @@ # The userdata is built from the `userdata.tpl` file. It is limited to ~16k bytes, # so comments about the userdata (~1k bytes) are here, not in the tpl file. # +# We use '>-' to handle quoting and escaping values in the YAML. +# # userdata for EKS worker nodes to configure Kubernetes applications on EC2 instances # In multipart MIME format so EKS can append to it. See: # https://docs.aws.amazon.com/eks/latest/userguide/launch-templates.html#launch-template-user-data # https://www.w3.org/Protocols/rfc1341/7_2_Multipart.html -# If you just provide a #!/bin/bash script like you can do when you provide the entire userdata you get +# If you just provide a #!/bin/bash script like you can do when you provide the entire userdata you get # an error at deploy time: Ec2LaunchTemplateInvalidConfiguration: User data was not in the MIME multipart format # +# We use a small boundary ("/:/+++") to save space. +# The format is +# --boundary +# Mime Type +# +# +# +# --boundary +# ## repeat +# ## end with +# --boundary-- + # See also: # https://aws.amazon.com/premiumsupport/knowledge-center/execute-user-data-ec2/ # https://docs.aws.amazon.com/eks/latest/userguide/launch-workers.html @@ -17,31 +31,49 @@ locals { + ami_os = split("_", var.ami_type)[0] + userdata_template_file = { + AL2 = "${path.module}/userdata.tpl" + AL2023 = "${path.module}/userdata_al2023.tpl" + BOTTLEROCKET = "${path.module}/userdata.tpl" + WINDOWS = "${path.module}/userdata_nt.tpl" + } + kubelet_extra_args = join(" ", var.kubelet_additional_options) + # We use '>-' to handle quoting and escaping values in the YAML. + + kubelet_extra_args_yaml = replace(local.kubelet_extra_args, "--", "\n - >-\n --") + userdata_vars = { - before_cluster_joining_userdata = length(var.before_cluster_joining_userdata) == 0 ? "" : var.before_cluster_joining_userdata[0] + before_cluster_joining_userdata = length(var.before_cluster_joining_userdata) == 0 ? "" : join("\n", var.before_cluster_joining_userdata) kubelet_extra_args = local.kubelet_extra_args - bootstrap_extra_args = length(var.bootstrap_additional_options) == 0 ? "" : var.bootstrap_additional_options[0] - after_cluster_joining_userdata = length(var.after_cluster_joining_userdata) == 0 ? "" : var.after_cluster_joining_userdata[0] - } + kubelet_extra_args_yaml = local.kubelet_extra_args_yaml + bootstrap_extra_args = length(var.bootstrap_additional_options) == 0 ? "" : join(" ", var.bootstrap_additional_options) + after_cluster_joining_userdata = length(var.after_cluster_joining_userdata) == 0 ? "" : join("\n", var.after_cluster_joining_userdata) - cluster_data = { cluster_endpoint = local.get_cluster_data ? data.aws_eks_cluster.this[0].endpoint : null certificate_authority_data = local.get_cluster_data ? data.aws_eks_cluster.this[0].certificate_authority[0].data : null cluster_name = local.get_cluster_data ? data.aws_eks_cluster.this[0].name : null + cluster_cidr = local.get_cluster_data ? coalesce(concat( + # prefer ipv4 address in dual stack + [for net in data.aws_eks_cluster.this[0].kubernetes_network_config : net.service_ipv4_cidr if net.ip_family == "ipv4"], + [for net in data.aws_eks_cluster.this[0].kubernetes_network_config : net.service_ipv6_cidr if net.ip_family == "ipv6"] + )...) : null } need_bootstrap = local.enabled ? length(concat(var.kubelet_additional_options, var.bootstrap_additional_options, var.after_cluster_joining_userdata )) > 0 : false - # If var.userdata_override_base64[0] = "" then we explicitly set userdata to "" + # If var.userdata_override_base64[0] is present then we use it rather than generating userdata need_userdata = local.enabled && length(var.userdata_override_base64) == 0 ? ( (length(var.before_cluster_joining_userdata) > 0) || local.need_bootstrap) : false userdata = local.need_userdata ? ( - base64encode(templatefile(local.is_windows ? "${path.module}/userdata_nt.tpl" : "${path.module}/userdata.tpl", merge(local.userdata_vars, local.cluster_data)))) : ( + base64encode( + templatefile(local.userdata_template_file[local.ami_os], local.userdata_vars)) + ) : ( try(var.userdata_override_base64[0], null) ) } diff --git a/userdata.tpl b/userdata.tpl index 181eabb..426eec4 100644 --- a/userdata.tpl +++ b/userdata.tpl @@ -3,6 +3,7 @@ Content-Type: multipart/mixed; boundary="/:/+++" --/:/+++ Content-Type: text/x-shellscript; charset="us-ascii" + #!/bin/bash # In multipart MIME format to support EKS appending to it diff --git a/userdata_al2023.tpl b/userdata_al2023.tpl new file mode 100644 index 0000000..f177f68 --- /dev/null +++ b/userdata_al2023.tpl @@ -0,0 +1,31 @@ +MIME-Version: 1.0 +Content-Type: multipart/mixed; boundary="/:/+++" + +%{ if length(before_cluster_joining_userdata) > 0 ~} +--/:/+++ +Content-Type: text/x-shellscript; charset="us-ascii" + +#!/bin/bash + +${before_cluster_joining_userdata} + +%{ endif ~} +%{~ if length(kubelet_extra_args_yaml) > 0 } +--/:/+++ +Content-Type: application/node.eks.aws + +--- +apiVersion: node.eks.aws/v1alpha1 +kind: NodeConfig +spec: + cluster: + name: ${cluster_name} + apiServerEndpoint: ${cluster_endpoint} + certificateAuthority: ${certificate_authority_data} + cidr: ${cluster_cidr} + kubelet: + flags: ${kubelet_extra_args_yaml} + +%{~ endif } + +--/:/+++-- diff --git a/variables-deprecated.tf b/variables-deprecated.tf index a001563..58617ef 100644 --- a/variables-deprecated.tf +++ b/variables-deprecated.tf @@ -1,3 +1,29 @@ +variable "ami_release_version" { + type = list(string) + description = <<-EOT + OBSOLETE: Use `ami_specifier` instead. Note that it has a different format. + Historical description: EKS AMI version to use, e.g. For AL2 \"1.16.13-20200821\" or for bottlerocket \"1.2.0-ccf1b754\" (no \"v\") or for Windows \"2023.02.14\". For AL2, bottlerocket and Windows, it defaults to latest version for Kubernetes version." + EOT + default = [] + nullable = false + validation { + condition = length(var.ami_release_version) == 0 + error_message = "variable `ami_release_version` is obsolete. Use `ami_specifier` instead." + } +} + +variable "cluster_autoscaler_enabled" { + type = bool + description = <<-EOT + OBSOLETE. Used to add support for the Kubernetes Cluster Autoscaler, but additional support is no longer needed. + EOT + default = null +} + +output "WARNING_cluster_autoscaler_enabled" { + value = var.cluster_autoscaler_enabled == null ? null : "WARNING: variable `cluster_autoscaler_enabled` is obsolete and has been ignored." +} + variable "block_device_mappings" { type = list(any) description = <<-EOT diff --git a/variables.tf b/variables.tf index 3aee2c4..296d984 100644 --- a/variables.tf +++ b/variables.tf @@ -5,27 +5,20 @@ variable "cluster_name" { variable "create_before_destroy" { type = bool - default = false description = <<-EOT Set true in order to create the new node group before destroying the old one. If false, the old node group will be destroyed first, causing downtime. Changing this setting will always cause node group to be replaced. EOT -} - -variable "cluster_autoscaler_enabled" { - type = bool - description = <<-EOT - Set `true` to label the node group so that the [Kubernetes Cluster Autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md#auto-discovery-setup) will discover and autoscale it. - Note that even when `false`, EKS will set the `k8s.io/cluster-autoscaler/enabled` label to `true` on the node group. - EOT default = false + nullable = false } variable "ec2_ssh_key_name" { type = list(string) - default = [] description = "SSH key pair name to use to access the worker nodes" + default = [] + nullable = false validation { condition = ( length(var.ec2_ssh_key_name) < 2 @@ -36,8 +29,9 @@ variable "ec2_ssh_key_name" { variable "ssh_access_security_group_ids" { type = list(string) - default = [] description = "Set of EC2 Security Group IDs to allow SSH access (port 22) to the worker nodes. If you specify `ec2_ssh_key`, but do not specify this configuration when you create an EKS Node Group, port 22 on the worker nodes is opened to the Internet (0.0.0.0/0)" + default = [] + nullable = false } variable "desired_size" { @@ -56,8 +50,8 @@ variable "min_size" { } variable "subnet_ids" { - description = "A list of subnet IDs to launch resources in" type = list(string) + description = "A list of subnet IDs to launch resources in" validation { condition = ( length(var.subnet_ids) > 0 @@ -68,39 +62,43 @@ variable "subnet_ids" { variable "associate_cluster_security_group" { type = bool - default = true description = <<-EOT When true, associate the default cluster security group to the nodes. If disabled the EKS managed security group will not be associated to the nodes and you will need to provide another security group that allows the nodes to communicate with the EKS control plane. Be aware that if no `associated_security_group_ids` or `ssh_access_security_group_ids` are provided, then the nodes will have no inbound or outbound rules. EOT + default = true + nullable = false } variable "associated_security_group_ids" { type = list(string) - default = [] description = <<-EOT A list of IDs of Security Groups to associate the node group with, in addition to the EKS' created security group. These security groups will not be modified. EOT + default = [] + nullable = false } variable "node_role_cni_policy_enabled" { type = bool - default = true description = <<-EOT When true, the `AmazonEKS_CNI_Policy` will be attached to the node IAM role. This used to be required, but it is [now recommended](https://docs.aws.amazon.com/eks/latest/userguide/create-node-role.html) that this policy be attached only to the `aws-node` Kubernetes service account. However, that is difficult to do with Terraform, so this module defaults to the old pattern. EOT + default = true + nullable = false } variable "node_role_arn" { type = list(string) - default = [] description = "If provided, assign workers the given role, which this module will not modify" + default = [] + nullable = false validation { condition = ( length(var.node_role_arn) < 2 @@ -111,8 +109,9 @@ variable "node_role_arn" { variable "node_role_policy_arns" { type = list(string) - default = [] description = "List of policy ARNs to attach to the worker role this module creates in addition to the default ones" + default = [] + nullable = false } variable "node_role_permissions_boundary" { @@ -125,43 +124,64 @@ variable "ami_type" { type = string description = <<-EOT Type of Amazon Machine Image (AMI) associated with the EKS Node Group. - Defaults to `AL2_x86_64`. Valid values: `AL2_x86_64, AL2_x86_64_GPU, AL2_ARM_64, CUSTOM, BOTTLEROCKET_ARM_64, BOTTLEROCKET_x86_64, BOTTLEROCKET_ARM_64_NVIDIA, BOTTLEROCKET_x86_64_NVIDIA, WINDOWS_CORE_2019_x86_64, WINDOWS_FULL_2019_x86_64, WINDOWS_CORE_2022_x86_64, WINDOWS_FULL_2022_x86_64`. + Defaults to `AL2_x86_64`. Valid values: `AL2_x86_64, AL2_x86_64_GPU, AL2_ARM_64, CUSTOM, BOTTLEROCKET_ARM_64, BOTTLEROCKET_x86_64, BOTTLEROCKET_ARM_64_NVIDIA, BOTTLEROCKET_x86_64_NVIDIA, WINDOWS_CORE_2019_x86_64, WINDOWS_FULL_2019_x86_64, WINDOWS_CORE_2022_x86_64, WINDOWS_FULL_2022_x86_64, AL2023_x86_64_STANDARD, AL2023_ARM_64_STANDARD`. EOT default = "AL2_x86_64" + nullable = false + validation { + condition = ( + contains(["AL2_x86_64", "AL2_x86_64_GPU", "AL2_ARM_64", "CUSTOM", "BOTTLEROCKET_ARM_64", "BOTTLEROCKET_x86_64", "BOTTLEROCKET_ARM_64_NVIDIA", "BOTTLEROCKET_x86_64_NVIDIA", "WINDOWS_CORE_2019_x86_64", "WINDOWS_FULL_2019_x86_64", "WINDOWS_CORE_2022_x86_64", "WINDOWS_FULL_2022_x86_64", "AL2023_x86_64_STANDARD", "AL2023_ARM_64_STANDARD"], var.ami_type) + ) + error_message = "Var ami_type must be one of \"AL2_x86_64\",\"AL2_x86_64_GPU\",\"AL2_ARM_64\",\"BOTTLEROCKET_ARM_64\",\"BOTTLEROCKET_x86_64\",\"BOTTLEROCKET_ARM_64_NVIDIA\",\"BOTTLEROCKET_x86_64_NVIDIA\",\"WINDOWS_CORE_2019_x86_64\",\"WINDOWS_FULL_2019_x86_64\",\"WINDOWS_CORE_2022_x86_64\",\"WINDOWS_FULL_2022_x86_64\", \"AL2023_x86_64_STANDARD\", \"AL2023_ARM_64_STANDARD\", or \"CUSTOM\"." + } +} + +variable "ami_image_id" { + type = list(string) + description = "AMI to use, overriding other AMI specifications, but must match `ami_type`. Ignored if `launch_template_id` is supplied." + default = [] + nullable = false validation { condition = ( - contains(["AL2_x86_64", "AL2_x86_64_GPU", "AL2_ARM_64", "CUSTOM", "BOTTLEROCKET_ARM_64", "BOTTLEROCKET_x86_64", "BOTTLEROCKET_ARM_64_NVIDIA", "BOTTLEROCKET_x86_64_NVIDIA", "WINDOWS_CORE_2019_x86_64", "WINDOWS_FULL_2019_x86_64", "WINDOWS_CORE_2022_x86_64", "WINDOWS_FULL_2022_x86_64"], var.ami_type) + length(var.ami_image_id) < 2 ) - error_message = "Var ami_type must be one of \"AL2_x86_64\",\"AL2_x86_64_GPU\",\"AL2_ARM_64\",\"BOTTLEROCKET_ARM_64\",\"BOTTLEROCKET_x86_64\",\"BOTTLEROCKET_ARM_64_NVIDIA\",\"BOTTLEROCKET_x86_64_NVIDIA\",\"WINDOWS_CORE_2019_x86_64\",\"WINDOWS_FULL_2019_x86_64\",\"WINDOWS_CORE_2022_x86_64\",\"WINDOWS_FULL_2022_x86_64\", or \"CUSTOM\"." + error_message = "You may not specify more than one `ami_image_id`." } } +variable "ami_specifier" { + type = string + description = <<-EOT + OS-dependent specifier for one of the several AMIs that match OS, architecture, and Kubernetes 1.xx version. + If not specified the recommended/latest AMI for the given Kubernetes version will be used. + Examples: + AL2: amazon-eks-node-1.29-v20240117 + AL2023: amazon-eks-node-al2023-x86_64-standard-1.29-v20240605 + Bottlerocket: 1.20.1-7c3e9198 + Windows: + EOT + default = "recommended" + nullable = false +} + + variable "instance_types" { type = list(string) - default = ["t3.medium"] description = <<-EOT Instance types to use for this node group (up to 20). Defaults to ["t3.medium"]. Must be empty if the launch template configured by `launch_template_id` specifies an instance type. EOT - validation { - condition = ( - length(var.instance_types) <= 20 - ) - error_message = "Per the EKS API, no more than 20 instance types may be specified." - } + default = ["t3.medium"] + nullable = false } variable "capacity_type" { type = string - default = null description = <<-EOT Type of capacity associated with the EKS Node Group. Valid values: "ON_DEMAND", "SPOT", or `null`. Terraform will only perform drift detection if a configuration value is provided. EOT - validation { - condition = var.capacity_type == null ? true : contains(["ON_DEMAND", "SPOT"], var.capacity_type) - error_message = "Capacity type must be either `null`, \"ON_DEMAND\", or \"SPOT\"." - } + default = null } variable "block_device_map" { @@ -184,16 +204,18 @@ variable "block_device_map" { Map of block device name specification, see [launch_template.block-devices](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/launch_template#block-devices). EOT # See https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/launch_template#ebs - default = { "/dev/xvda" = { ebs = {} } } + default = { "/dev/xvda" = { ebs = {} } } + nullable = false } variable "update_config" { type = list(map(number)) - default = [] description = <<-EOT Configuration for the `eks_node_group` [`update_config` Configuration Block](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_node_group#update_config-configuration-block). Specify exactly one of `max_unavailable` (node count) or `max_unavailable_percentage` (percentage of nodes). EOT + default = [] + nullable = false } variable "kubernetes_labels" { @@ -203,12 +225,13 @@ variable "kubernetes_labels" { Other Kubernetes labels applied to the EKS Node Group will not be managed. EOT default = {} + nullable = false } variable "kubernetes_taints" { type = list(object({ key = string - value = string + value = optional(string) effect = string })) description = <<-EOT @@ -217,6 +240,7 @@ variable "kubernetes_taints" { `key` and `effect` are required, `value` may be null. EOT default = [] + nullable = false } variable "kubelet_additional_options" { @@ -226,7 +250,6 @@ variable "kubelet_additional_options" { DO NOT include `--node-labels` or `--node-taints`, use `kubernetes_labels` and `kubernetes_taints` to specify those." EOT - default = [] validation { condition = (length(compact(var.kubelet_additional_options)) == 0 ? true : length(regexall("--node-labels", join(" ", var.kubelet_additional_options))) == 0 && @@ -234,35 +257,12 @@ variable "kubelet_additional_options" { ) error_message = "Var kubelet_additional_options must not contain \"--node-labels\" or \"--node-taints\". Use `kubernetes_labels` and `kubernetes_taints` to specify labels and taints." } -} - -variable "ami_image_id" { - type = list(string) - default = [] - description = "AMI to use. Ignored if `launch_template_id` is supplied." - validation { - condition = ( - length(var.ami_image_id) < 2 - ) - error_message = "You may not specify more than one `ami_image_id`." - } -} - -variable "ami_release_version" { - type = list(string) - default = [] - description = "EKS AMI version to use, e.g. For AL2 \"1.16.13-20200821\" or for bottlerocket \"1.2.0-ccf1b754\" (no \"v\") or for Windows \"2023.02.14\". For AL2, bottlerocket and Windows, it defaults to latest version for Kubernetes version." - validation { - condition = ( - length(var.ami_release_version) == 0 ? true : length(regexall("(^\\d+\\.\\d+\\.\\d+-[\\da-z]+$)|(^\\d+\\.\\d+\\.\\d+$)", var.ami_release_version[0])) == 1 - ) - error_message = "Var ami_release_version, if supplied, must be like for AL2 \"1.16.13-20200821\" or for bottlerocket \"1.2.0-ccf1b754\" (no \"v\") or for Windows \"2023.02.14\"." - } + default = [] + nullable = false } variable "kubernetes_version" { type = list(string) - default = [] description = "Kubernetes version. Defaults to EKS Cluster Kubernetes version. Terraform will only perform drift detection if a configuration value is provided" validation { condition = ( @@ -270,6 +270,8 @@ variable "kubernetes_version" { ) error_message = "Var kubernetes_version, if supplied, must be like \"1.16\" (no patch level)." } + default = [] + nullable = false } variable "module_depends_on" { @@ -286,7 +288,6 @@ variable "ebs_optimized" { variable "launch_template_id" { type = list(string) - default = [] description = "The ID (not name) of a custom launch template to use for the EKS node group. If provided, it must specify the AMI image ID." validation { condition = ( @@ -294,11 +295,12 @@ variable "launch_template_id" { ) error_message = "You may not specify more than one `launch_template_id`." } + default = [] + nullable = false } variable "launch_template_version" { type = list(string) - default = [] description = "The version of the specified launch template to use. Defaults to latest version." validation { condition = ( @@ -306,68 +308,51 @@ variable "launch_template_version" { ) error_message = "You may not specify more than one `launch_template_version`." } + default = [] + nullable = false } variable "resources_to_tag" { type = list(string) description = "List of auto-launched resource types to tag. Valid types are \"instance\", \"volume\", \"elastic-gpu\", \"spot-instances-request\", \"network-interface\"." default = ["instance", "volume", "network-interface"] - validation { - condition = ( - length(compact([for r in var.resources_to_tag : r if !contains(["instance", "volume", "elastic-gpu", "spot-instances-request", "network-interface"], r)])) == 0 - ) - error_message = "Invalid resource type in `resources_to_tag`. Valid types are \"instance\", \"volume\", \"elastic-gpu\", \"spot-instances-request\", \"network-interface\"." - } + nullable = false } variable "before_cluster_joining_userdata" { type = list(string) - default = [] description = "Additional `bash` commands to execute on each worker node before joining the EKS cluster (before executing the `bootstrap.sh` script). For more info, see https://kubedex.com/90-days-of-aws-eks-in-production" - validation { - condition = ( - length(var.before_cluster_joining_userdata) < 2 - ) - error_message = "You may not specify more than one `before_cluster_joining_userdata`." - } + default = [] + nullable = false } variable "after_cluster_joining_userdata" { type = list(string) - default = [] description = "Additional `bash` commands to execute on each worker node after joining the EKS cluster (after executing the `bootstrap.sh` script). For more info, see https://kubedex.com/90-days-of-aws-eks-in-production" - validation { - condition = ( - length(var.after_cluster_joining_userdata) < 2 - ) - error_message = "You may not specify more than one `after_cluster_joining_userdata`." - } + default = [] + nullable = false } variable "bootstrap_additional_options" { type = list(string) + description = "Additional options to bootstrap.sh. DO NOT include `--kubelet-additional-args`, use `kubelet_additional_options` var instead. Not used with AL2023 AMI types." default = [] - description = "Additional options to bootstrap.sh. DO NOT include `--kubelet-additional-args`, use `kubelet_additional_options` var instead." - validation { - condition = ( - length(var.bootstrap_additional_options) < 2 - ) - error_message = "You may not specify more than one `bootstrap_additional_options`." - } + nullable = false } variable "userdata_override_base64" { type = list(string) - default = [] description = <<-EOT Many features of this module rely on the `bootstrap.sh` provided with Amazon Linux, and this module may generate "user data" that expects to find that script. If you want to use an AMI that is not - compatible with the Amazon Linux `bootstrap.sh` initialization, then use `userdata_override_base64` to provide + compatible with the userdata generated by this module, then use `userdata_override_base64` to provide your own (Base64 encoded) user data. Use "" to prevent any user data from being set. Setting `userdata_override_base64` disables `kubernetes_taints`, `kubelet_additional_options`, `before_cluster_joining_userdata`, `after_cluster_joining_userdata`, and `bootstrap_additional_options`. EOT + default = [] + nullable = false validation { condition = ( length(var.userdata_override_base64) < 2 @@ -378,17 +363,19 @@ variable "userdata_override_base64" { variable "metadata_http_endpoint_enabled" { type = bool - default = true description = "Set false to disable the Instance Metadata Service." + default = true + nullable = false } variable "metadata_http_put_response_hop_limit" { type = number - default = 2 description = <<-EOT The desired HTTP PUT response hop limit (between 1 and 64) for Instance Metadata Service requests. The default is `2` to allows containerized workloads assuming the instance profile, but it's not really recomended. You should use OIDC service accounts instead. EOT + default = 2 + nullable = false validation { condition = ( var.metadata_http_put_response_hop_limit >= 1 @@ -399,66 +386,74 @@ variable "metadata_http_put_response_hop_limit" { variable "metadata_http_tokens_required" { type = bool - default = true description = "Set true to require IMDS session tokens, disabling Instance Metadata Service Version 1." + default = true + nullable = false } variable "placement" { type = list(any) - default = [] description = <<-EOT Configuration for the [`placement` Configuration Block](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/launch_template#placement) of the launch template. Leave list empty for defaults. Pass list with single object with attributes matching the `placement` block to configure it. Note that this configures the launch template only. Some elements will be ignored by the Auto Scaling Group that actually launches instances. Consult AWS documentation for details. EOT + default = [] + nullable = false } variable "cpu_options" { type = list(any) - default = [] description = <<-EOT Configuration for the [`cpu_options` Configuration Block](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/launch_template#cpu_options) of the launch template. Leave list empty for defaults. Pass list with single object with attributes matching the `cpu_options` block to configure it. Note that this configures the launch template only. Some elements will be ignored by the Auto Scaling Group that actually launches instances. Consult AWS documentation for details. EOT + default = [] + nullable = false } variable "enclave_enabled" { type = bool - default = false description = "Set to `true` to enable Nitro Enclaves on the instance." + default = false + nullable = false } variable "node_group_terraform_timeouts" { type = list(object({ - create = string - update = string - delete = string + create = optional(string) + update = optional(string) + delete = optional(string) })) - default = [] description = <<-EOT Configuration for the Terraform [`timeouts` Configuration Block](https://www.terraform.io/docs/language/resources/syntax.html#operation-timeouts) of the node group resource. Leave list empty for defaults. Pass list with single object with attributes matching the `timeouts` block to configure it. Leave attribute values `null` to preserve individual defaults while setting others. EOT + default = [] + nullable = false } variable "detailed_monitoring_enabled" { type = bool - default = false description = "The launched EC2 instance will have detailed monitoring enabled. Defaults to false" + default = false + nullable = false } variable "force_update_version" { type = bool - default = false description = "When updating the Kubernetes version, force Pods to be removed even if PodDisruptionBudget or taint/toleration issues would otherwise prevent them from being removed (and cause the update to fail)" + default = false + nullable = false } variable "replace_node_group_on_version_update" { type = bool - default = false description = "Force Node Group replacement when updating to a new Kubernetes version. If set to `false` (the default), the Node Groups will be updated in-place" + default = false + nullable = false } diff --git a/versions.tf b/versions.tf index 1f8c739..aeb1ff8 100644 --- a/versions.tf +++ b/versions.tf @@ -6,8 +6,9 @@ terraform { # retrieve launch template by ID starts at 3.21.0 # update_config starts at 3.56 # Windows support starts at 4.48 https://github.com/hashicorp/terraform-provider-aws/blob/main/CHANGELOG.md#4480-december-19-2022 + # SSM parameter `insecure_value` starts at 5.8 source = "hashicorp/aws" - version = ">= 4.48" + version = ">= 5.8" } random = { source = "hashicorp/random" From 4ee809d9d50ed7eed0bc8326c6beaaca41111472 Mon Sep 17 00:00:00 2001 From: Nuru Date: Mon, 10 Jun 2024 15:20:25 -0700 Subject: [PATCH 2/5] Enable immediate update on launch template changes --- README.md | 141 +++++++++++++++++++++++++--------------- README.yaml | 62 ++++++++++++++---- ami.tf | 14 ++-- docs/terraform.md | 8 ++- launch-template.tf | 82 +++++++++++++++++------ main.tf | 34 +++++----- outputs.tf | 2 +- security-group.tf | 3 +- variables-deprecated.tf | 5 ++ variables.tf | 38 +++++++++-- 10 files changed, 261 insertions(+), 128 deletions(-) diff --git a/README.md b/README.md index 34a4cca..60108bf 100644 --- a/README.md +++ b/README.md @@ -27,11 +27,11 @@ --> -Terraform module to provision an EKS Node Group for [Elastic Container Service for Kubernetes](https://aws.amazon.com/eks/). +Terraform module to provision an EKS Managed Node Group for [Elastic Kubernetes Service](https://aws.amazon.com/eks/). -Instantiate it multiple times to create many EKS node groups with specific settings such as GPUs, EC2 instance types, or autoscale parameters. +Instantiate it multiple times to create EKS Managed Node Groups with specific settings such as GPUs, EC2 instance types, or autoscale parameters. -**IMPORTANT:** This module provisions an `EKS Node Group` nodes globally accessible by SSH (22) port. Normally, AWS recommends that no security group allows unrestricted ingress access to port 22 . +**IMPORTANT:** When SSH access is enabled without specifying a source security group, this module provisions `EKS Node Group` nodes that are globally accessible by SSH (22) port. Normally, AWS recommends that no security group allows unrestricted ingress access to port 22 . > [!TIP] @@ -48,9 +48,81 @@ Instantiate it multiple times to create many EKS node groups with specific setti ## Introduction +This module creates an [EKS Managed Node Group](https://docs.aws.amazon.com/eks/latest/userguide/managed-node-groups.html) +for an [EKS](https://aws.amazon.com/eks/) cluster. +It assumes you have already created an EKS cluster, but you can create the cluster and the node group in the +same Terraform configuration. See our +[full-featured root module (a.k.a. component) `eks/cluster`](https://github.com/cloudposse/terraform-aws-components/tree/main/modules/eks/cluster) +for an example of how to do that. +### Launch Templates +This module always uses a [launch template](https://docs.aws.amazon.com/autoscaling/ec2/userguide/launch-templates.html) +to create the node group. You can create your own launch template and +pass in its ID, or else this module will create one for you. +The AWS default for EKS is that if the launch template is updated, the existing nodes will not be affected. Only +new instances added to the node group would get the changes specified in the new launch template. In contrast, +when the launch template changes, this module can immediately create a new node group from the new launch template +to replace the old one. + +See the inputs `create_before_destroy` and `immediately_apply_lt_changes` for details about how to control this behavior. + +### Operating system differences + +Currently, EKS supports 4 Operating Systems: Amazon Linux 2, Amazon Linux 2023, Bottlerocket, and Windows Server. +This module supports all 4 OSes, but support for detailed configuration of the nodes varies by OS. The 4 inputs: + +1. `before_cluster_joining_userdata` +2. `kubelet_additional_options` +3. `bootstrap_additional_options` +4. `after_cluster_joining_userdata` + +are fully supported for Amazon Linux 2 and Windows, and take advantage of the [bootstrap.sh](https://github.com/awslabs/amazon-eks-ami/blob/main/templates/al2/runtime/bootstrap.sh) +supplied on those AMIs. **NONE** of these inputs are supported on Bottlerocket. On AL2023, only the first 2 are supported. + +Note that for all OSes, you can supply the complete `userdata` contents, which will be untouched by this module, via `userdata_override_base64`. + + +> [!TIP] +> #### Use Terraform Reference Architectures for AWS +> +> Use Cloud Posse's ready-to-go [terraform architecture blueprints](https://cloudposse.com/reference-architecture/) for AWS to get up and running quickly. +> +> ✅ We build it together with your team.
+> ✅ Your team owns everything.
+> ✅ 100% Open Source and backed by fanatical support.
+> +> Request Quote +>
📚 Learn More +> +>
+> +> Cloud Posse is the leading [**DevOps Accelerator**](https://cpco.io/commercial-support?utm_source=github&utm_medium=readme&utm_campaign=cloudposse/terraform-aws-eks-node-group&utm_content=commercial_support) for funded startups and enterprises. +> +> *Your team can operate like a pro today.* +> +> Ensure that your team succeeds by using Cloud Posse's proven process and turnkey blueprints. Plus, we stick around until you succeed. +> #### Day-0: Your Foundation for Success +> - **Reference Architecture.** You'll get everything you need from the ground up built using 100% infrastructure as code. +> - **Deployment Strategy.** Adopt a proven deployment strategy with GitHub Actions, enabling automated, repeatable, and reliable software releases. +> - **Site Reliability Engineering.** Gain total visibility into your applications and services with Datadog, ensuring high availability and performance. +> - **Security Baseline.** Establish a secure environment from the start, with built-in governance, accountability, and comprehensive audit logs, safeguarding your operations. +> - **GitOps.** Empower your team to manage infrastructure changes confidently and efficiently through Pull Requests, leveraging the full power of GitHub Actions. +> +> Request Quote +> +> #### Day-2: Your Operational Mastery +> - **Training.** Equip your team with the knowledge and skills to confidently manage the infrastructure, ensuring long-term success and self-sufficiency. +> - **Support.** Benefit from a seamless communication over Slack with our experts, ensuring you have the support you need, whenever you need it. +> - **Troubleshooting.** Access expert assistance to quickly resolve any operational challenges, minimizing downtime and maintaining business continuity. +> - **Code Reviews.** Enhance your team’s code quality with our expert feedback, fostering continuous improvement and collaboration. +> - **Bug Fixes.** Rely on our team to troubleshoot and resolve any issues, ensuring your systems run smoothly. +> - **Migration Assistance.** Accelerate your migration process with our dedicated support, minimizing disruption and speeding up time-to-value. +> - **Customer Workshops.** Engage with our team in weekly workshops, gaining insights and strategies to continuously improve and innovate. +> +> Request Quote +>
## Usage @@ -58,6 +130,11 @@ Instantiate it multiple times to create many EKS node groups with specific setti ### Major Changes (breaking and otherwise) +With the v3.0.0 release of this module, support for Amazon Linux 2023 (AL2023) has +been added, and some breaking changes have been made. Please see the +[release notes](https://github.com/cloudposse/terraform-aws-eks-node-group/releases/tag/3.0.0) +for details. + With the v2.0.0 (a.k.a. v0.25.0) release of this module, it has undergone major breaking changes and added new features. Please see the [migration](docs/migration-v1-v2.md) document for details. @@ -68,13 +145,6 @@ For a complete example, see [examples/complete](examples/complete). For automated tests of the complete example using [bats](https://github.com/bats-core/bats-core) and [Terratest](https://github.com/gruntwork-io/terratest) (which tests and deploys the example on AWS), see [test](test). -### Terraform Version - -Terraform version 1.0 is out. Before that, there was Terraform version 0.15, 0.14, 0.13 and so on. -The v2.0.0 release of this module drops support for Terraform 0.13. That version is old and has lots of known issues. -There are hardly any breaking changes between Terraform 0.13 and 1.0, so please upgrade to -the latest Terraform version before raising any issues about this module. - ### Sources of Information - The code examples below are manually updated and have a tendency to fall out of sync with actual code, @@ -144,7 +214,7 @@ module "subnets" { module "eks_cluster" { source = "cloudposse/eks-cluster/aws" # Cloud Posse recommends pinning every module to a specific version - # version = "2.x.x" + # version = "4.x.x" vpc_id = module.vpc.vpc_id subnet_ids = module.subnets.public_subnet_ids @@ -158,7 +228,7 @@ module "eks_cluster" { module "eks_node_group" { source = "cloudposse/eks-node-group/aws" # Cloud Posse recommends pinning every module to a specific version - # version = "2.x.x" + # version = "3.x.x" instance_types = [var.instance_type] subnet_ids = module.subnets.public_subnet_ids @@ -294,7 +364,7 @@ https://docs.aws.amazon.com/eks/latest/userguide/windows-support.html | [after\_cluster\_joining\_userdata](#input\_after\_cluster\_joining\_userdata) | Additional `bash` commands to execute on each worker node after joining the EKS cluster (after executing the `bootstrap.sh` script). For more info, see https://kubedex.com/90-days-of-aws-eks-in-production | `list(string)` | `[]` | no | | [ami\_image\_id](#input\_ami\_image\_id) | AMI to use, overriding other AMI specifications, but must match `ami_type`. Ignored if `launch_template_id` is supplied. | `list(string)` | `[]` | no | | [ami\_release\_version](#input\_ami\_release\_version) | OBSOLETE: Use `ami_specifier` instead. Note that it has a different format.
Historical description: EKS AMI version to use, e.g. For AL2 \"1.16.13-20200821\" or for bottlerocket \"1.2.0-ccf1b754\" (no \"v\") or for Windows \"2023.02.14\". For AL2, bottlerocket and Windows, it defaults to latest version for Kubernetes version." | `list(string)` | `[]` | no | -| [ami\_specifier](#input\_ami\_specifier) | OS-dependent specifier for one of the several AMIs that match OS, architecture, and Kubernetes 1.xx version.
If not specified the recommended/latest AMI for the given Kubernetes version will be used.
Examples:
AL2: amazon-eks-node-1.29-v20240117
AL2023: amazon-eks-node-al2023-x86\_64-standard-1.29-v20240605
Bottlerocket: 1.20.1-7c3e9198
Windows: | `string` | `"recommended"` | no | +| [ami\_specifier](#input\_ami\_specifier) | OS-dependent specifier for one of the several AMIs that match OS, architecture, and Kubernetes 1.xx version.
If not specified the recommended/latest AMI for the given Kubernetes version will be used.
Unfortunately, the format of this value varies by OS, and we have not found documentation for it.
You can generally figure it out from the AMI name or description, and validate it by trying to retrieve
the SSM Public Parameter for the AMI ID.

Examples:
AL2: amazon-eks-node-1.29-v20240117
AL2023: amazon-eks-node-al2023-x86\_64-standard-1.29-v20240605
Bottlerocket: 1.20.1-7c3e9198 \_# Note: 1.20.1 is the Bottlerocket, not Kubernetes, version\_
Windows: | `string` | `"recommended"` | no | | [ami\_type](#input\_ami\_type) | Type of Amazon Machine Image (AMI) associated with the EKS Node Group.
Defaults to `AL2_x86_64`. Valid values: `AL2_x86_64, AL2_x86_64_GPU, AL2_ARM_64, CUSTOM, BOTTLEROCKET_ARM_64, BOTTLEROCKET_x86_64, BOTTLEROCKET_ARM_64_NVIDIA, BOTTLEROCKET_x86_64_NVIDIA, WINDOWS_CORE_2019_x86_64, WINDOWS_FULL_2019_x86_64, WINDOWS_CORE_2022_x86_64, WINDOWS_FULL_2022_x86_64, AL2023_x86_64_STANDARD, AL2023_ARM_64_STANDARD`. | `string` | `"AL2_x86_64"` | no | | [associate\_cluster\_security\_group](#input\_associate\_cluster\_security\_group) | When true, associate the default cluster security group to the nodes. If disabled the EKS managed security group will not
be associated to the nodes and you will need to provide another security group that allows the nodes to communicate with
the EKS control plane. Be aware that if no `associated_security_group_ids` or `ssh_access_security_group_ids` are provided,
then the nodes will have no inbound or outbound rules. | `bool` | `true` | no | | [associated\_security\_group\_ids](#input\_associated\_security\_group\_ids) | A list of IDs of Security Groups to associate the node group with, in addition to the EKS' created security group.
These security groups will not be modified. | `list(string)` | `[]` | no | @@ -308,7 +378,7 @@ https://docs.aws.amazon.com/eks/latest/userguide/windows-support.html | [cluster\_name](#input\_cluster\_name) | The name of the EKS cluster | `string` | n/a | yes | | [context](#input\_context) | Single object for setting entire context at once.
See description of individual variables for details.
Leave string and numeric variables as `null` to use default value.
Individual variable settings (non-null) override settings in context object,
except for attributes, tags, and additional\_tag\_map, which are merged. | `any` |
{
"additional_tag_map": {},
"attributes": [],
"delimiter": null,
"descriptor_formats": {},
"enabled": true,
"environment": null,
"id_length_limit": null,
"label_key_case": null,
"label_order": [],
"label_value_case": null,
"labels_as_tags": [
"unset"
],
"name": null,
"namespace": null,
"regex_replace_chars": null,
"stage": null,
"tags": {},
"tenant": null
}
| no | | [cpu\_options](#input\_cpu\_options) | Configuration for the [`cpu_options` Configuration Block](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/launch_template#cpu_options) of the launch template.
Leave list empty for defaults. Pass list with single object with attributes matching the `cpu_options` block to configure it.
Note that this configures the launch template only. Some elements will be ignored by the Auto Scaling Group
that actually launches instances. Consult AWS documentation for details. | `list(any)` | `[]` | no | -| [create\_before\_destroy](#input\_create\_before\_destroy) | Set true in order to create the new node group before destroying the old one.
If false, the old node group will be destroyed first, causing downtime.
Changing this setting will always cause node group to be replaced. | `bool` | `false` | no | +| [create\_before\_destroy](#input\_create\_before\_destroy) | If `true` (default), a new node group will be created before destroying the old one.
If `false`, the old node group will be destroyed first, causing downtime.
Changing this setting will always cause node group to be replaced. | `bool` | `true` | no | | [delimiter](#input\_delimiter) | Delimiter to be used between ID elements.
Defaults to `-` (hyphen). Set to `""` to use no delimiter at all. | `string` | `null` | no | | [descriptor\_formats](#input\_descriptor\_formats) | Describe additional descriptors to be output in the `descriptors` output map.
Map of maps. Keys are names of descriptors. Values are maps of the form
`{
format = string
labels = list(string)
}`
(Type is `any` so the map values can later be enhanced to provide additional options.)
`format` is a Terraform format string to be passed to the `format()` function.
`labels` is a list of labels, in order, to pass to `format()` function.
Label values will be normalized before being passed to `format()` so they will be
identical to how they appear in `id`.
Default is `{}` (`descriptors` output will be empty). | `any` | `{}` | no | | [desired\_size](#input\_desired\_size) | Initial desired number of worker nodes (external changes ignored) | `number` | n/a | yes | @@ -320,6 +390,7 @@ https://docs.aws.amazon.com/eks/latest/userguide/windows-support.html | [environment](#input\_environment) | ID element. Usually used for region e.g. 'uw2', 'us-west-2', OR role 'prod', 'staging', 'dev', 'UAT' | `string` | `null` | no | | [force\_update\_version](#input\_force\_update\_version) | When updating the Kubernetes version, force Pods to be removed even if PodDisruptionBudget or taint/toleration issues would otherwise prevent them from being removed (and cause the update to fail) | `bool` | `false` | no | | [id\_length\_limit](#input\_id\_length\_limit) | Limit `id` to this many characters (minimum 6).
Set to `0` for unlimited length.
Set to `null` for keep the existing setting, which defaults to `0`.
Does not affect `id_full`. | `number` | `null` | no | +| [immediately\_apply\_lt\_changes](#input\_immediately\_apply\_lt\_changes) | When `true`, any change to the launch template will be applied immediately.
When `false`, the changes will only affect new nodes when they are launched.
When `null` (default) this input takes the value of `create_before_destroy`.
**NOTE:** Setting this to `false` does not guarantee that other changes,
such as `ami_type`, will not cause changes to be applied immediately. | `bool` | `null` | no | | [instance\_types](#input\_instance\_types) | Instance types to use for this node group (up to 20). Defaults to ["t3.medium"].
Must be empty if the launch template configured by `launch_template_id` specifies an instance type. | `list(string)` |
[
"t3.medium"
]
| no | | [kubelet\_additional\_options](#input\_kubelet\_additional\_options) | Additional flags to pass to kubelet.
DO NOT include `--node-labels` or `--node-taints`,
use `kubernetes_labels` and `kubernetes_taints` to specify those." | `list(string)` | `[]` | no | | [kubernetes\_labels](#input\_kubernetes\_labels) | Key-value mapping of Kubernetes labels. Only labels that are applied with the EKS API are managed by this argument.
Other Kubernetes labels applied to the EKS Node Group will not be managed. | `map(string)` | `{}` | no | @@ -345,6 +416,7 @@ https://docs.aws.amazon.com/eks/latest/userguide/windows-support.html | [node\_role\_permissions\_boundary](#input\_node\_role\_permissions\_boundary) | If provided, all IAM roles will be created with this permissions boundary attached. | `string` | `null` | no | | [node\_role\_policy\_arns](#input\_node\_role\_policy\_arns) | List of policy ARNs to attach to the worker role this module creates in addition to the default ones | `list(string)` | `[]` | no | | [placement](#input\_placement) | Configuration for the [`placement` Configuration Block](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/launch_template#placement) of the launch template.
Leave list empty for defaults. Pass list with single object with attributes matching the `placement` block to configure it.
Note that this configures the launch template only. Some elements will be ignored by the Auto Scaling Group
that actually launches instances. Consult AWS documentation for details. | `list(any)` | `[]` | no | +| [random\_pet\_length](#input\_random\_pet\_length) | In order to support "create before destroy" behavior, this module uses the `random_pet`
resource to generate a unique pet name for the node group, since the node group name
must be unique, meaning the new node group must have a different name than the old one.
This variable controls the length of the pet name, meaning the number of pet names
concatenated together. This module defaults to 1, but there are only 452 names available,
so users with large numbers of node groups may want to increase this value. | `number` | `1` | no | | [regex\_replace\_chars](#input\_regex\_replace\_chars) | Terraform regular expression (regex) string.
Characters matching the regex will be removed from the ID elements.
If not set, `"/[^a-zA-Z0-9-]/"` is used to remove all characters other than hyphens, letters and digits. | `string` | `null` | no | | [replace\_node\_group\_on\_version\_update](#input\_replace\_node\_group\_on\_version\_update) | Force Node Group replacement when updating to a new Kubernetes version. If set to `false` (the default), the Node Groups will be updated in-place | `bool` | `false` | no | | [resources\_to\_tag](#input\_resources\_to\_tag) | List of auto-launched resource types to tag. Valid types are "instance", "volume", "elastic-gpu", "spot-instances-request", "network-interface". | `list(string)` |
[
"instance",
"volume",
"network-interface"
]
| no | @@ -360,8 +432,8 @@ https://docs.aws.amazon.com/eks/latest/userguide/windows-support.html | Name | Description | |------|-------------| +| [WARNING\_ami\_release\_version](#output\_WARNING\_ami\_release\_version) | Include the warning output message to quite the linter about unused variables. | | [WARNING\_cluster\_autoscaler\_enabled](#output\_WARNING\_cluster\_autoscaler\_enabled) | n/a | -| [ami\_ids](#output\_ami\_ids) | n/a | | [eks\_node\_group\_ami\_id](#output\_eks\_node\_group\_ami\_id) | The ID of the AMI used for the worker nodes, if specified | | [eks\_node\_group\_arn](#output\_eks\_node\_group\_arn) | Amazon Resource Name (ARN) of the EKS Node Group | | [eks\_node\_group\_cbd\_pet\_name](#output\_eks\_node\_group\_cbd\_pet\_name) | The pet name of this node group, if this module generated one | @@ -394,45 +466,6 @@ Check out these related projects. - [terraform-aws-ec2-instance-group](https://github.com/cloudposse/terraform-aws-ec2-instance-group) - Terraform module for provisioning multiple general purpose EC2 hosts for stateful applications -> [!TIP] -> #### Use Terraform Reference Architectures for AWS -> -> Use Cloud Posse's ready-to-go [terraform architecture blueprints](https://cloudposse.com/reference-architecture/) for AWS to get up and running quickly. -> -> ✅ We build it together with your team.
-> ✅ Your team owns everything.
-> ✅ 100% Open Source and backed by fanatical support.
-> -> Request Quote ->
📚 Learn More -> ->
-> -> Cloud Posse is the leading [**DevOps Accelerator**](https://cpco.io/commercial-support?utm_source=github&utm_medium=readme&utm_campaign=cloudposse/terraform-aws-eks-node-group&utm_content=commercial_support) for funded startups and enterprises. -> -> *Your team can operate like a pro today.* -> -> Ensure that your team succeeds by using Cloud Posse's proven process and turnkey blueprints. Plus, we stick around until you succeed. -> #### Day-0: Your Foundation for Success -> - **Reference Architecture.** You'll get everything you need from the ground up built using 100% infrastructure as code. -> - **Deployment Strategy.** Adopt a proven deployment strategy with GitHub Actions, enabling automated, repeatable, and reliable software releases. -> - **Site Reliability Engineering.** Gain total visibility into your applications and services with Datadog, ensuring high availability and performance. -> - **Security Baseline.** Establish a secure environment from the start, with built-in governance, accountability, and comprehensive audit logs, safeguarding your operations. -> - **GitOps.** Empower your team to manage infrastructure changes confidently and efficiently through Pull Requests, leveraging the full power of GitHub Actions. -> -> Request Quote -> -> #### Day-2: Your Operational Mastery -> - **Training.** Equip your team with the knowledge and skills to confidently manage the infrastructure, ensuring long-term success and self-sufficiency. -> - **Support.** Benefit from a seamless communication over Slack with our experts, ensuring you have the support you need, whenever you need it. -> - **Troubleshooting.** Access expert assistance to quickly resolve any operational challenges, minimizing downtime and maintaining business continuity. -> - **Code Reviews.** Enhance your team’s code quality with our expert feedback, fostering continuous improvement and collaboration. -> - **Bug Fixes.** Rely on our team to troubleshoot and resolve any issues, ensuring your systems run smoothly. -> - **Migration Assistance.** Accelerate your migration process with our dedicated support, minimizing disruption and speeding up time-to-value. -> - **Customer Workshops.** Engage with our team in weekly workshops, gaining insights and strategies to continuously improve and innovate. -> -> Request Quote ->
## ✨ Contributing diff --git a/README.yaml b/README.yaml index d9310e9..5dc3799 100644 --- a/README.yaml +++ b/README.yaml @@ -61,18 +61,59 @@ related: url: "https://github.com/cloudposse/terraform-aws-ec2-instance-group" # Short description of this project description: |- - Terraform module to provision an EKS Node Group for [Elastic Container Service for Kubernetes](https://aws.amazon.com/eks/). + Terraform module to provision an EKS Managed Node Group for [Elastic Kubernetes Service](https://aws.amazon.com/eks/). - Instantiate it multiple times to create many EKS node groups with specific settings such as GPUs, EC2 instance types, or autoscale parameters. + Instantiate it multiple times to create EKS Managed Node Groups with specific settings such as GPUs, EC2 instance types, or autoscale parameters. + + **IMPORTANT:** When SSH access is enabled without specifying a source security group, this module provisions `EKS Node Group` nodes that are globally accessible by SSH (22) port. Normally, AWS recommends that no security group allows unrestricted ingress access to port 22 . + +introduction: |- + This module creates an [EKS Managed Node Group](https://docs.aws.amazon.com/eks/latest/userguide/managed-node-groups.html) + for an [EKS](https://aws.amazon.com/eks/) cluster. + It assumes you have already created an EKS cluster, but you can create the cluster and the node group in the + same Terraform configuration. See our + [full-featured root module (a.k.a. component) `eks/cluster`](https://github.com/cloudposse/terraform-aws-components/tree/main/modules/eks/cluster) + for an example of how to do that. + + ### Launch Templates + + This module always uses a [launch template](https://docs.aws.amazon.com/autoscaling/ec2/userguide/launch-templates.html) + to create the node group. You can create your own launch template and + pass in its ID, or else this module will create one for you. + + The AWS default for EKS is that if the launch template is updated, the existing nodes will not be affected. Only + new instances added to the node group would get the changes specified in the new launch template. In contrast, + when the launch template changes, this module can immediately create a new node group from the new launch template + to replace the old one. + + See the inputs `create_before_destroy` and `immediately_apply_lt_changes` for details about how to control this behavior. + + ### Operating system differences + + Currently, EKS supports 4 Operating Systems: Amazon Linux 2, Amazon Linux 2023, Bottlerocket, and Windows Server. + This module supports all 4 OSes, but support for detailed configuration of the nodes varies by OS. The 4 inputs: + + 1. `before_cluster_joining_userdata` + 2. `kubelet_additional_options` + 3. `bootstrap_additional_options` + 4. `after_cluster_joining_userdata` + + are fully supported for Amazon Linux 2 and Windows, and take advantage of the [bootstrap.sh](https://github.com/awslabs/amazon-eks-ami/blob/main/templates/al2/runtime/bootstrap.sh) + supplied on those AMIs. **NONE** of these inputs are supported on Bottlerocket. On AL2023, only the first 2 are supported. + + Note that for all OSes, you can supply the complete `userdata` contents, which will be untouched by this module, via `userdata_override_base64`. - **IMPORTANT:** This module provisions an `EKS Node Group` nodes globally accessible by SSH (22) port. Normally, AWS recommends that no security group allows unrestricted ingress access to port 22 . -introduction: "" # How to use this project -usage: |2- +usage: |- ### Major Changes (breaking and otherwise) + With the v3.0.0 release of this module, support for Amazon Linux 2023 (AL2023) has + been added, and some breaking changes have been made. Please see the + [release notes](https://github.com/cloudposse/terraform-aws-eks-node-group/releases/tag/3.0.0) + for details. + With the v2.0.0 (a.k.a. v0.25.0) release of this module, it has undergone major breaking changes and added new features. Please see the [migration](docs/migration-v1-v2.md) document for details. @@ -83,13 +124,6 @@ usage: |2- For automated tests of the complete example using [bats](https://github.com/bats-core/bats-core) and [Terratest](https://github.com/gruntwork-io/terratest) (which tests and deploys the example on AWS), see [test](test). - ### Terraform Version - - Terraform version 1.0 is out. Before that, there was Terraform version 0.15, 0.14, 0.13 and so on. - The v2.0.0 release of this module drops support for Terraform 0.13. That version is old and has lots of known issues. - There are hardly any breaking changes between Terraform 0.13 and 1.0, so please upgrade to - the latest Terraform version before raising any issues about this module. - ### Sources of Information - The code examples below are manually updated and have a tendency to fall out of sync with actual code, @@ -159,7 +193,7 @@ usage: |2- module "eks_cluster" { source = "cloudposse/eks-cluster/aws" # Cloud Posse recommends pinning every module to a specific version - # version = "2.x.x" + # version = "4.x.x" vpc_id = module.vpc.vpc_id subnet_ids = module.subnets.public_subnet_ids @@ -173,7 +207,7 @@ usage: |2- module "eks_node_group" { source = "cloudposse/eks-node-group/aws" # Cloud Posse recommends pinning every module to a specific version - # version = "2.x.x" + # version = "3.x.x" instance_types = [var.instance_type] subnet_ids = module.subnets.public_subnet_ids diff --git a/ami.tf b/ami.tf index b350944..e483203 100644 --- a/ami.tf +++ b/ami.tf @@ -54,20 +54,14 @@ locals { # Kubernetes version priority (first one to be set wins) # 1. var.kubernetes_version # 2. data.eks_cluster.this.kubernetes_version - use_cluster_kubernetes_version = local.enabled ? local.need_ami_id && length(var.kubernetes_version) == 0 : false + use_cluster_kubernetes_version = local.enabled && length(var.kubernetes_version) == 0 need_cluster_kubernetes_version = local.use_cluster_kubernetes_version - ami_kubernetes_version = local.use_cluster_kubernetes_version ? data.aws_eks_cluster.this[0].version : var.kubernetes_version[0] + resolved_kubernetes_version = local.use_cluster_kubernetes_version ? data.aws_eks_cluster.this[0].version : var.kubernetes_version[0] } data "aws_ssm_parameter" "ami_id" { - count = 1 # local.enabled && local.need_ami_id ? 1 : 0 + count = local.enabled && local.need_ami_id ? 1 : 0 - name = format(local.ami_ssm_format[var.ami_type], local.ami_specifier, local.ami_kubernetes_version) -} - -output "ami_ids" { - value = { - for key, value in data.aws_ssm_parameter.ami_id : key => value.insecure_value - } + name = format(local.ami_ssm_format[var.ami_type], local.ami_specifier, local.resolved_kubernetes_version) } diff --git a/docs/terraform.md b/docs/terraform.md index 6ea9d1e..51ff082 100644 --- a/docs/terraform.md +++ b/docs/terraform.md @@ -51,7 +51,7 @@ | [after\_cluster\_joining\_userdata](#input\_after\_cluster\_joining\_userdata) | Additional `bash` commands to execute on each worker node after joining the EKS cluster (after executing the `bootstrap.sh` script). For more info, see https://kubedex.com/90-days-of-aws-eks-in-production | `list(string)` | `[]` | no | | [ami\_image\_id](#input\_ami\_image\_id) | AMI to use, overriding other AMI specifications, but must match `ami_type`. Ignored if `launch_template_id` is supplied. | `list(string)` | `[]` | no | | [ami\_release\_version](#input\_ami\_release\_version) | OBSOLETE: Use `ami_specifier` instead. Note that it has a different format.
Historical description: EKS AMI version to use, e.g. For AL2 \"1.16.13-20200821\" or for bottlerocket \"1.2.0-ccf1b754\" (no \"v\") or for Windows \"2023.02.14\". For AL2, bottlerocket and Windows, it defaults to latest version for Kubernetes version." | `list(string)` | `[]` | no | -| [ami\_specifier](#input\_ami\_specifier) | OS-dependent specifier for one of the several AMIs that match OS, architecture, and Kubernetes 1.xx version.
If not specified the recommended/latest AMI for the given Kubernetes version will be used.
Examples:
AL2: amazon-eks-node-1.29-v20240117
AL2023: amazon-eks-node-al2023-x86\_64-standard-1.29-v20240605
Bottlerocket: 1.20.1-7c3e9198
Windows: | `string` | `"recommended"` | no | +| [ami\_specifier](#input\_ami\_specifier) | OS-dependent specifier for one of the several AMIs that match OS, architecture, and Kubernetes 1.xx version.
If not specified the recommended/latest AMI for the given Kubernetes version will be used.
Unfortunately, the format of this value varies by OS, and we have not found documentation for it.
You can generally figure it out from the AMI name or description, and validate it by trying to retrieve
the SSM Public Parameter for the AMI ID.

Examples:
AL2: amazon-eks-node-1.29-v20240117
AL2023: amazon-eks-node-al2023-x86\_64-standard-1.29-v20240605
Bottlerocket: 1.20.1-7c3e9198 \_# Note: 1.20.1 is the Bottlerocket, not Kubernetes, version\_
Windows: | `string` | `"recommended"` | no | | [ami\_type](#input\_ami\_type) | Type of Amazon Machine Image (AMI) associated with the EKS Node Group.
Defaults to `AL2_x86_64`. Valid values: `AL2_x86_64, AL2_x86_64_GPU, AL2_ARM_64, CUSTOM, BOTTLEROCKET_ARM_64, BOTTLEROCKET_x86_64, BOTTLEROCKET_ARM_64_NVIDIA, BOTTLEROCKET_x86_64_NVIDIA, WINDOWS_CORE_2019_x86_64, WINDOWS_FULL_2019_x86_64, WINDOWS_CORE_2022_x86_64, WINDOWS_FULL_2022_x86_64, AL2023_x86_64_STANDARD, AL2023_ARM_64_STANDARD`. | `string` | `"AL2_x86_64"` | no | | [associate\_cluster\_security\_group](#input\_associate\_cluster\_security\_group) | When true, associate the default cluster security group to the nodes. If disabled the EKS managed security group will not
be associated to the nodes and you will need to provide another security group that allows the nodes to communicate with
the EKS control plane. Be aware that if no `associated_security_group_ids` or `ssh_access_security_group_ids` are provided,
then the nodes will have no inbound or outbound rules. | `bool` | `true` | no | | [associated\_security\_group\_ids](#input\_associated\_security\_group\_ids) | A list of IDs of Security Groups to associate the node group with, in addition to the EKS' created security group.
These security groups will not be modified. | `list(string)` | `[]` | no | @@ -65,7 +65,7 @@ | [cluster\_name](#input\_cluster\_name) | The name of the EKS cluster | `string` | n/a | yes | | [context](#input\_context) | Single object for setting entire context at once.
See description of individual variables for details.
Leave string and numeric variables as `null` to use default value.
Individual variable settings (non-null) override settings in context object,
except for attributes, tags, and additional\_tag\_map, which are merged. | `any` |
{
"additional_tag_map": {},
"attributes": [],
"delimiter": null,
"descriptor_formats": {},
"enabled": true,
"environment": null,
"id_length_limit": null,
"label_key_case": null,
"label_order": [],
"label_value_case": null,
"labels_as_tags": [
"unset"
],
"name": null,
"namespace": null,
"regex_replace_chars": null,
"stage": null,
"tags": {},
"tenant": null
}
| no | | [cpu\_options](#input\_cpu\_options) | Configuration for the [`cpu_options` Configuration Block](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/launch_template#cpu_options) of the launch template.
Leave list empty for defaults. Pass list with single object with attributes matching the `cpu_options` block to configure it.
Note that this configures the launch template only. Some elements will be ignored by the Auto Scaling Group
that actually launches instances. Consult AWS documentation for details. | `list(any)` | `[]` | no | -| [create\_before\_destroy](#input\_create\_before\_destroy) | Set true in order to create the new node group before destroying the old one.
If false, the old node group will be destroyed first, causing downtime.
Changing this setting will always cause node group to be replaced. | `bool` | `false` | no | +| [create\_before\_destroy](#input\_create\_before\_destroy) | If `true` (default), a new node group will be created before destroying the old one.
If `false`, the old node group will be destroyed first, causing downtime.
Changing this setting will always cause node group to be replaced. | `bool` | `true` | no | | [delimiter](#input\_delimiter) | Delimiter to be used between ID elements.
Defaults to `-` (hyphen). Set to `""` to use no delimiter at all. | `string` | `null` | no | | [descriptor\_formats](#input\_descriptor\_formats) | Describe additional descriptors to be output in the `descriptors` output map.
Map of maps. Keys are names of descriptors. Values are maps of the form
`{
format = string
labels = list(string)
}`
(Type is `any` so the map values can later be enhanced to provide additional options.)
`format` is a Terraform format string to be passed to the `format()` function.
`labels` is a list of labels, in order, to pass to `format()` function.
Label values will be normalized before being passed to `format()` so they will be
identical to how they appear in `id`.
Default is `{}` (`descriptors` output will be empty). | `any` | `{}` | no | | [desired\_size](#input\_desired\_size) | Initial desired number of worker nodes (external changes ignored) | `number` | n/a | yes | @@ -77,6 +77,7 @@ | [environment](#input\_environment) | ID element. Usually used for region e.g. 'uw2', 'us-west-2', OR role 'prod', 'staging', 'dev', 'UAT' | `string` | `null` | no | | [force\_update\_version](#input\_force\_update\_version) | When updating the Kubernetes version, force Pods to be removed even if PodDisruptionBudget or taint/toleration issues would otherwise prevent them from being removed (and cause the update to fail) | `bool` | `false` | no | | [id\_length\_limit](#input\_id\_length\_limit) | Limit `id` to this many characters (minimum 6).
Set to `0` for unlimited length.
Set to `null` for keep the existing setting, which defaults to `0`.
Does not affect `id_full`. | `number` | `null` | no | +| [immediately\_apply\_lt\_changes](#input\_immediately\_apply\_lt\_changes) | When `true`, any change to the launch template will be applied immediately.
When `false`, the changes will only affect new nodes when they are launched.
When `null` (default) this input takes the value of `create_before_destroy`.
**NOTE:** Setting this to `false` does not guarantee that other changes,
such as `ami_type`, will not cause changes to be applied immediately. | `bool` | `null` | no | | [instance\_types](#input\_instance\_types) | Instance types to use for this node group (up to 20). Defaults to ["t3.medium"].
Must be empty if the launch template configured by `launch_template_id` specifies an instance type. | `list(string)` |
[
"t3.medium"
]
| no | | [kubelet\_additional\_options](#input\_kubelet\_additional\_options) | Additional flags to pass to kubelet.
DO NOT include `--node-labels` or `--node-taints`,
use `kubernetes_labels` and `kubernetes_taints` to specify those." | `list(string)` | `[]` | no | | [kubernetes\_labels](#input\_kubernetes\_labels) | Key-value mapping of Kubernetes labels. Only labels that are applied with the EKS API are managed by this argument.
Other Kubernetes labels applied to the EKS Node Group will not be managed. | `map(string)` | `{}` | no | @@ -102,6 +103,7 @@ | [node\_role\_permissions\_boundary](#input\_node\_role\_permissions\_boundary) | If provided, all IAM roles will be created with this permissions boundary attached. | `string` | `null` | no | | [node\_role\_policy\_arns](#input\_node\_role\_policy\_arns) | List of policy ARNs to attach to the worker role this module creates in addition to the default ones | `list(string)` | `[]` | no | | [placement](#input\_placement) | Configuration for the [`placement` Configuration Block](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/launch_template#placement) of the launch template.
Leave list empty for defaults. Pass list with single object with attributes matching the `placement` block to configure it.
Note that this configures the launch template only. Some elements will be ignored by the Auto Scaling Group
that actually launches instances. Consult AWS documentation for details. | `list(any)` | `[]` | no | +| [random\_pet\_length](#input\_random\_pet\_length) | In order to support "create before destroy" behavior, this module uses the `random_pet`
resource to generate a unique pet name for the node group, since the node group name
must be unique, meaning the new node group must have a different name than the old one.
This variable controls the length of the pet name, meaning the number of pet names
concatenated together. This module defaults to 1, but there are only 452 names available,
so users with large numbers of node groups may want to increase this value. | `number` | `1` | no | | [regex\_replace\_chars](#input\_regex\_replace\_chars) | Terraform regular expression (regex) string.
Characters matching the regex will be removed from the ID elements.
If not set, `"/[^a-zA-Z0-9-]/"` is used to remove all characters other than hyphens, letters and digits. | `string` | `null` | no | | [replace\_node\_group\_on\_version\_update](#input\_replace\_node\_group\_on\_version\_update) | Force Node Group replacement when updating to a new Kubernetes version. If set to `false` (the default), the Node Groups will be updated in-place | `bool` | `false` | no | | [resources\_to\_tag](#input\_resources\_to\_tag) | List of auto-launched resource types to tag. Valid types are "instance", "volume", "elastic-gpu", "spot-instances-request", "network-interface". | `list(string)` |
[
"instance",
"volume",
"network-interface"
]
| no | @@ -117,8 +119,8 @@ | Name | Description | |------|-------------| +| [WARNING\_ami\_release\_version](#output\_WARNING\_ami\_release\_version) | Include the warning output message to quite the linter about unused variables. | | [WARNING\_cluster\_autoscaler\_enabled](#output\_WARNING\_cluster\_autoscaler\_enabled) | n/a | -| [ami\_ids](#output\_ami\_ids) | n/a | | [eks\_node\_group\_ami\_id](#output\_eks\_node\_group\_ami\_id) | The ID of the AMI used for the worker nodes, if specified | | [eks\_node\_group\_arn](#output\_eks\_node\_group\_arn) | Amazon Resource Name (ARN) of the EKS Node Group | | [eks\_node\_group\_cbd\_pet\_name](#output\_eks\_node\_group\_cbd\_pet\_name) | The pet name of this node group, if this module generated one | diff --git a/launch-template.tf b/launch-template.tf index 8b459b4..632800d 100644 --- a/launch-template.tf +++ b/launch-template.tf @@ -31,7 +31,7 @@ locals { local.fetch_launch_template ? data.aws_launch_template.this[0].latest_version : aws_launch_template.default[0].latest_version )) : null - launch_template_ami = length(var.ami_image_id) == 0 ? (local.features_require_ami ? data.aws_ssm_parameter.ami_id[0].insecure_value : "") : var.ami_image_id[0] + launch_template_ami = length(var.ami_image_id) == 0 ? (local.generate_launch_template ? data.aws_ssm_parameter.ami_id[0].insecure_value : "") : var.ami_image_id[0] associate_cluster_security_group = local.enabled && var.associate_cluster_security_group launch_template_vpc_security_group_ids = sort(compact(concat( @@ -39,10 +39,35 @@ locals { module.ssh_access[*].id, var.associated_security_group_ids ))) + + # Create a launch template configuration object to use for managing node group updates + launch_template_config = { + ebs_optimized = var.ebs_optimized + block_device_mappings = local.block_device_map + image_id = local.launch_template_ami + key_name = local.ec2_ssh_key_name + tag_specifications = var.resources_to_tag + metadata_options = { + # Despite being documented as "Optional", `http_endpoint` is required when `http_put_response_hop_limit` is set. + # We set it to the default setting of "enabled". + http_endpoint = var.metadata_http_endpoint_enabled ? "enabled" : "disabled" + http_put_response_hop_limit = var.metadata_http_put_response_hop_limit + http_tokens = var.metadata_http_tokens_required ? "required" : "optional" + } + vpc_security_group_ids = local.launch_template_vpc_security_group_ids + user_data = local.userdata + tags = local.node_group_tags + cpu_options = var.cpu_options + placement = var.placement + enclave_options = var.enclave_enabled ? ["true"] : [] + monitoring = { + enabled = var.detailed_monitoring_enabled + } + } } resource "aws_launch_template" "default" { - # We'll use this default if we aren't provided with a launch template during invocation. + # We'll use this if we aren't provided with a launch template during invocation. # We would like to generate a new launch template every time the security group list changes # so that we can detach the network interfaces from the security groups that we no # longer need, so that the security groups can then be deleted, but we cannot guarantee @@ -51,10 +76,10 @@ resource "aws_launch_template" "default" { count = local.generate_launch_template ? 1 : 0 - ebs_optimized = var.ebs_optimized + ebs_optimized = local.launch_template_config.ebs_optimized dynamic "block_device_mappings" { - for_each = local.block_device_map + for_each = local.launch_template_config.block_device_mappings content { device_name = block_device_mappings.key @@ -83,11 +108,11 @@ resource "aws_launch_template" "default" { # Never include instance type in launch template because it is limited to just one # https://docs.aws.amazon.com/eks/latest/APIReference/API_CreateNodegroup.html#API_CreateNodegroup_RequestSyntax - image_id = local.launch_template_ami == "" ? null : local.launch_template_ami - key_name = local.ec2_ssh_key_name + image_id = local.launch_template_config.image_id + key_name = local.launch_template_config.key_name dynamic "tag_specifications" { - for_each = var.resources_to_tag + for_each = local.launch_template_config.tag_specifications content { resource_type = tag_specifications.value tags = local.node_tags @@ -103,17 +128,17 @@ resource "aws_launch_template" "default" { # Despite being documented as "Optional", `http_endpoint` is required when `http_put_response_hop_limit` is set. # We set it to the default setting of "enabled". - http_endpoint = var.metadata_http_endpoint_enabled ? "enabled" : "disabled" - http_put_response_hop_limit = var.metadata_http_put_response_hop_limit - http_tokens = var.metadata_http_tokens_required ? "required" : "optional" + http_endpoint = local.launch_template_config.metadata_options.http_endpoint + http_put_response_hop_limit = local.launch_template_config.metadata_options.http_put_response_hop_limit + http_tokens = local.launch_template_config.metadata_options.http_tokens } - vpc_security_group_ids = local.launch_template_vpc_security_group_ids - user_data = local.userdata - tags = local.node_group_tags + vpc_security_group_ids = local.launch_template_config.vpc_security_group_ids + user_data = local.launch_template_config.user_data + tags = local.launch_template_config.tags dynamic "cpu_options" { - for_each = var.cpu_options + for_each = local.launch_template_config.cpu_options content { core_count = lookup(cpu_options.value, "core_count", null) @@ -122,7 +147,7 @@ resource "aws_launch_template" "default" { } dynamic "placement" { - for_each = var.placement + for_each = local.launch_template_config.placement content { affinity = lookup(placement.value, "affinity", null) @@ -137,7 +162,7 @@ resource "aws_launch_template" "default" { } dynamic "enclave_options" { - for_each = var.enclave_enabled ? ["true"] : [] + for_each = local.launch_template_config.enclave_options content { enabled = true @@ -145,17 +170,32 @@ resource "aws_launch_template" "default" { } monitoring { - enabled = var.detailed_monitoring_enabled + enabled = local.launch_template_config.monitoring.enabled } lifecycle { + # See userdata.tf for authoritative details. This is here because it has to be on a resource, and no resources are defined in userdata.tf + # + # Supported OSes: AL2, AL2023, BOTTLEROCKET, WINDOWS + # Userdata inputs: before_cluster_joining_userdata, kubelet_additional_options, bootstrap_additional_options, after_cluster_joining_userdata + # We test local.userdata_vars because they have been massaged and perhaps augmented, and we want to + # test the final form, even if it means giving a confusing error message at times. + # We list supported OSes explicitly to catch any new ones that are added. + precondition { + condition = contains(["AL2", "AL2023", "WINDOWS"], local.ami_os) || length(local.userdata_vars.before_cluster_joining_userdata) == 0 || (local.ami_os == "AL2" || local.ami_os == "WINDOWS") + error_message = format("The input `before_cluster_joining_userdata` is not supported for %v.", title(lower(local.ami_os))) + } + precondition { + condition = contains(["AL2", "WINDOWS"], local.ami_os) || length(local.userdata_vars.bootstrap_extra_args) == 0 + error_message = format("The input `bootstrap_additional_options` is not supported for %v.", title(lower(local.ami_os))) + } precondition { - condition = length(local.userdata_vars.bootstrap_extra_args) == 0 || local.ami_os != "AL2023" - error_message = "The input `bootstrap_additional_options` is not supported for AL2023." + condition = contains(["AL2", "AL2023", "WINDOWS"], local.ami_os) || length(local.userdata_vars.kubelet_extra_args) == 0 || (local.ami_os == "AL2" || local.ami_os == "WINDOWS") + error_message = format("The input `kubelet_additional_options` is not supported for %v.", title(lower(local.ami_os))) } precondition { - condition = length(local.userdata_vars.after_cluster_joining_userdata) == 0 || local.ami_os != "AL2023" - error_message = "The input `after_cluster_joining_userdata` is not supported for AL2023." + condition = contains(["AL2", "WINDOWS"], local.ami_os) || length(local.userdata_vars.after_cluster_joining_userdata) == 0 || (local.ami_os == "AL2" || local.ami_os == "WINDOWS") + error_message = format("The input `after_cluster_joining_userdata` is not supported for %v.", title(lower(local.ami_os))) } } } diff --git a/main.tf b/main.tf index c6cad07..ee4357e 100644 --- a/main.tf +++ b/main.tf @@ -1,10 +1,9 @@ locals { enabled = module.this.enabled - # See https://aws.amazon.com/blogs/containers/introducing-launch-template-and-custom-ami-support-in-amazon-eks-managed-node-groups/ - features_require_ami = local.enabled && local.need_bootstrap - need_ami_id = local.enabled ? local.features_require_ami && length(var.ami_image_id) == 0 : false - # features_require_launch_template = local.enabled ? length(var.resources_to_tag) > 0 || local.need_userdata || local.features_require_ami || local.need_imds_settings : false + immediately_apply_lt_changes = coalesce(var.immediately_apply_lt_changes, var.create_before_destroy) + + need_ami_id = local.enabled && length(var.ami_image_id) == 0 have_ssh_key = local.enabled && length(var.ec2_ssh_key_name) == 1 ec2_ssh_key_name = local.have_ssh_key ? var.ec2_ssh_key_name[0] : null @@ -57,9 +56,6 @@ locals { ) # It does not help to add the autoscaler tags to the node group tags, # because they only matter when applied to the autoscaling group. - # TODO: - # Replace: node_group_tags = merge(local.node_tags, local.autoscaler_enabled ? local.autoscaler_tags : null) - # with: node_group_tags = local.node_tags node_group_tags = local.node_tags } @@ -92,9 +88,8 @@ locals { capacity_type = var.capacity_type labels = var.kubernetes_labels == null ? {} : var.kubernetes_labels - taints = var.kubernetes_taints - release_version = local.launch_template_ami == "" ? try(var.ami_release_version[0], null) : null - version = length(compact(concat([local.launch_template_ami], var.ami_release_version))) == 0 ? try(var.kubernetes_version[0], null) : null + taints = var.kubernetes_taints + version = local.resolved_kubernetes_version tags = local.node_group_tags @@ -112,16 +107,19 @@ resource "random_pet" "cbd" { count = local.enabled && var.create_before_destroy ? 1 : 0 separator = module.label.delimiter - length = 1 + length = var.random_pet_length keepers = merge( { - node_role_arn = local.ng.node_role_arn - subnet_ids = join(",", local.ng.subnet_ids) - instance_types = join(",", local.ng.instance_types) - ami_type = local.ng.ami_type - capacity_type = local.ng.capacity_type - launch_template_id = local.launch_template_id + node_role_arn = local.ng.node_role_arn + subnet_ids = join(",", local.ng.subnet_ids) + instance_types = join(",", local.ng.instance_types) + ami_type = local.ng.ami_type + capacity_type = local.ng.capacity_type + launch_template_id = local.launch_template_configured || !local.immediately_apply_lt_changes ? local.launch_template_id : ( + # If we want changes to the generated launch template to be applied immediately, keep the settings + jsonencode(local.launch_template_config) + ) }, # If `var.replace_node_group_on_version_update` is set to `true`, the Node Groups will be replaced instead of updated in-place var.replace_node_group_on_version_update && local.ng.version != null ? @@ -154,7 +152,6 @@ resource "aws_eks_node_group" "default" { instance_types = local.ng.instance_types ami_type = local.ng.ami_type labels = local.ng.labels - release_version = local.ng.release_version version = local.ng.version force_update_version = local.ng.force_update_version @@ -234,7 +231,6 @@ resource "aws_eks_node_group" "cbd" { instance_types = local.ng.instance_types ami_type = local.ng.ami_type labels = local.ng.labels - release_version = local.ng.release_version version = local.ng.version force_update_version = local.ng.force_update_version diff --git a/outputs.tf b/outputs.tf index d16d764..fd59220 100644 --- a/outputs.tf +++ b/outputs.tf @@ -20,7 +20,7 @@ output "eks_node_group_arn" { output "eks_node_group_resources" { description = "List of objects containing information about underlying resources of the EKS Node Group" - value = local.enabled ? (var.create_before_destroy ? aws_eks_node_group.cbd[*].resources : aws_eks_node_group.default[*].resources) : [] + value = local.enabled ? try(var.create_before_destroy ? aws_eks_node_group.cbd[0].resources : aws_eks_node_group.default[0].resources, []) : [] } output "eks_node_group_status" { diff --git a/security-group.tf b/security-group.tf index e0c3ed9..4100799 100644 --- a/security-group.tf +++ b/security-group.tf @@ -18,8 +18,7 @@ module "ssh_access" { rule_matrix = [{ key = "ssh" source_security_group_ids = var.ssh_access_security_group_ids - #bridgecrew:skip=BC_AWS_NETWORKING_1:Skipping `Port Security 0.0.0.0:0 to 22` check because we want to allow SSH access to all nodes in the nodeGroup - cidr_blocks = length(var.ssh_access_security_group_ids) == 0 ? ["0.0.0.0/0"] : [] + cidr_blocks = length(var.ssh_access_security_group_ids) == 0 ? ["0.0.0.0/0"] : [] rules = [{ key = "ssh" type = "ingress" diff --git a/variables-deprecated.tf b/variables-deprecated.tf index 58617ef..4ab48b7 100644 --- a/variables-deprecated.tf +++ b/variables-deprecated.tf @@ -12,6 +12,11 @@ variable "ami_release_version" { } } +# Include the warning output message to quite the linter about unused variables. +output "WARNING_ami_release_version" { + value = length(var.ami_release_version) == 0 ? null : "WARNING: variable `ami_release_version` is obsolete and has been ignored." +} + variable "cluster_autoscaler_enabled" { type = bool description = <<-EOT diff --git a/variables.tf b/variables.tf index 296d984..fe6b285 100644 --- a/variables.tf +++ b/variables.tf @@ -6,14 +6,40 @@ variable "cluster_name" { variable "create_before_destroy" { type = bool description = <<-EOT - Set true in order to create the new node group before destroying the old one. - If false, the old node group will be destroyed first, causing downtime. + If `true` (default), a new node group will be created before destroying the old one. + If `false`, the old node group will be destroyed first, causing downtime. Changing this setting will always cause node group to be replaced. EOT - default = false + default = true + nullable = false +} + +variable "random_pet_length" { + type = number + description = <<-EOT + In order to support "create before destroy" behavior, this module uses the `random_pet` + resource to generate a unique pet name for the node group, since the node group name + must be unique, meaning the new node group must have a different name than the old one. + This variable controls the length of the pet name, meaning the number of pet names + concatenated together. This module defaults to 1, but there are only 452 names available, + so users with large numbers of node groups may want to increase this value. + EOT + default = 1 nullable = false } +variable "immediately_apply_lt_changes" { + type = bool + description = <<-EOT + When `true`, any change to the launch template will be applied immediately. + When `false`, the changes will only affect new nodes when they are launched. + When `null` (default) this input takes the value of `create_before_destroy`. + **NOTE:** Setting this to `false` does not guarantee that other changes, + such as `ami_type`, will not cause changes to be applied immediately. + EOT + default = null +} + variable "ec2_ssh_key_name" { type = list(string) description = "SSH key pair name to use to access the worker nodes" @@ -154,10 +180,14 @@ variable "ami_specifier" { description = <<-EOT OS-dependent specifier for one of the several AMIs that match OS, architecture, and Kubernetes 1.xx version. If not specified the recommended/latest AMI for the given Kubernetes version will be used. + Unfortunately, the format of this value varies by OS, and we have not found documentation for it. + You can generally figure it out from the AMI name or description, and validate it by trying to retrieve + the SSM Public Parameter for the AMI ID. + Examples: AL2: amazon-eks-node-1.29-v20240117 AL2023: amazon-eks-node-al2023-x86_64-standard-1.29-v20240605 - Bottlerocket: 1.20.1-7c3e9198 + Bottlerocket: 1.20.1-7c3e9198 _# Note: 1.20.1 is the Bottlerocket, not Kubernetes, version_ Windows: EOT default = "recommended" From 3b961867b3676a350a4780b71f5e7445e0385f68 Mon Sep 17 00:00:00 2001 From: Nuru Date: Mon, 10 Jun 2024 16:46:23 -0700 Subject: [PATCH 3/5] Keep configured version for keeper and AMI, exclude from node group --- main.tf | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/main.tf b/main.tf index ee4357e..a72488a 100644 --- a/main.tf +++ b/main.tf @@ -88,8 +88,7 @@ locals { capacity_type = var.capacity_type labels = var.kubernetes_labels == null ? {} : var.kubernetes_labels - taints = var.kubernetes_taints - version = local.resolved_kubernetes_version + taints = var.kubernetes_taints tags = local.node_group_tags @@ -122,9 +121,9 @@ resource "random_pet" "cbd" { ) }, # If `var.replace_node_group_on_version_update` is set to `true`, the Node Groups will be replaced instead of updated in-place - var.replace_node_group_on_version_update && local.ng.version != null ? + var.replace_node_group_on_version_update ? { - version = local.ng.version + version = var.kubernetes_version } : {} ) } @@ -152,7 +151,7 @@ resource "aws_eks_node_group" "default" { instance_types = local.ng.instance_types ami_type = local.ng.ami_type labels = local.ng.labels - version = local.ng.version + version = null # derived from AMI force_update_version = local.ng.force_update_version capacity_type = local.ng.capacity_type @@ -231,7 +230,7 @@ resource "aws_eks_node_group" "cbd" { instance_types = local.ng.instance_types ami_type = local.ng.ami_type labels = local.ng.labels - version = local.ng.version + version = null # derived from AMI force_update_version = local.ng.force_update_version capacity_type = local.ng.capacity_type From 53e3f485815bf4d9a99850dd7cecb7a032dc67fb Mon Sep 17 00:00:00 2001 From: Nuru Date: Tue, 11 Jun 2024 14:59:49 -0700 Subject: [PATCH 4/5] Add security group accidentally removed --- ami.tf | 11 ++++++++--- examples/complete/main.tf | 5 +++-- main.tf | 2 +- 3 files changed, 12 insertions(+), 6 deletions(-) diff --git a/ami.tf b/ami.tf index e483203..b3391e1 100644 --- a/ami.tf +++ b/ami.tf @@ -39,16 +39,21 @@ locals { WINDOWS_FULL_2022_x86_64 = "/aws/service/ami-windows-latest/Windows_Server-2022-English-Full-EKS_Optimized-%[2]v/image_id" } - # AMI specifiers + # AMI specifiers? + # Specifiers for AL2 and AL2023 are AMI Name from https://github.com/awslabs/amazon-eks-ami/releases # AL2 # AMI name: amazon-eks-node-1.29-v20240117 # AMI SSM param: /aws/service/eks/optimized-ami/1.29/amazon-linux-2/amazon-eks-node-1.29-v20240117/image_id # AL2023 # AMI name: amazon-eks-node-al2023-arm64-standard-1.29-v20240605 # AMI SSM param: /aws/service/eks/optimized-ami/1.29/amazon-linux-2023/x86_64/standard/amazon-eks-node-al2023-x86_64-standard-1.29-v20240605/image_id + # Specifiers for Bottlerocket are the bare release version (e.g. `1.20.0`) or + # the release version and the commit hash (e.g. `1.20.0-7c3e9198`) # Bottlerocket: - # AMI name: bottlerocket-aws-k8s-1.24-nvidia-x86_64-v1.20.1-7c3e9198 - # AMI SSM param: bottlerocket/aws-k8s-1.24-nvidia/x86_64/1.20.1-7c3e9198/image_id # No "v" + # AMI name: bottlerocket-aws-k8s-1.26-nvidia-x86_64-v1.17.0-53f322c2 + # AMI SSM param: /aws/service/bottlerocket/aws-k8s-1.26-nvidia/x86_64/1.17.0/image_id # No "v" + # /aws/service/bottlerocket/aws-k8s-1.26-nvidia/x86_64/1.17.0--53f322c2/image_id + # Windows does not allow a specifier. ami_specifier = var.ami_specifier == "recommended" && startswith(var.ami_type, "BOTTLEROCKET") ? "latest" : var.ami_specifier # Kubernetes version priority (first one to be set wins) diff --git a/examples/complete/main.tf b/examples/complete/main.tf index a92e495..506a5e6 100644 --- a/examples/complete/main.tf +++ b/examples/complete/main.tf @@ -167,14 +167,15 @@ module "eks_node_group" { node_role_policy_arns = [local.extra_policy_arn] update_config = var.update_config - after_cluster_joining_userdata = var.after_cluster_joining_userdata ami_type = var.ami_type ami_specifier = var.ami_specifier + /* before_cluster_joining_userdata = var.before_cluster_joining_userdata - kubelet_additional_options = var.kubelet_additional_options + after_cluster_joining_userdata = var.after_cluster_joining_userdata + */ create_before_destroy = true diff --git a/main.tf b/main.tf index a72488a..0575a1f 100644 --- a/main.tf +++ b/main.tf @@ -12,9 +12,9 @@ locals { get_cluster_data = local.enabled ? ( local.need_cluster_kubernetes_version || + local.associate_cluster_security_group || local.need_bootstrap || local.need_ssh_access_sg || - length(var.associated_security_group_ids) > 0 || (length(local.kubelet_extra_args) > 0 && local.ami_os == "AL2023") ) : false From 0ae466a31f7ae32497cc2316fc13a179b2d3af6e Mon Sep 17 00:00:00 2001 From: Nuru Date: Sun, 16 Jun 2024 15:55:57 -0700 Subject: [PATCH 5/5] Fix release version, userdata, use of AMI ID in Launch Template --- README.md | 5 +- ami.tf | 84 ++++++++--- docs/terraform.md | 5 +- examples/complete/fixtures.us-east-2.tfvars | 14 +- examples/complete/main.tf | 40 ++++- examples/complete/variables.tf | 20 +-- launch-template.tf | 2 +- main.tf | 55 ++++--- test/src/default_test.go | 32 ++++ test/src/examples_complete_test.go | 159 ++++++++------------ test/src/framework_test.go | 130 ++++++++++++++++ test/src/go.mod | 2 +- userdata.tf | 48 ++++-- userdata_al2023.tpl | 13 +- variables-deprecated.tf | 19 --- variables.tf | 43 ++++-- 16 files changed, 453 insertions(+), 218 deletions(-) create mode 100644 test/src/default_test.go create mode 100644 test/src/framework_test.go diff --git a/README.md b/README.md index 60108bf..cd5a3b1 100644 --- a/README.md +++ b/README.md @@ -349,6 +349,7 @@ https://docs.aws.amazon.com/eks/latest/userguide/windows-support.html | [aws_iam_role_policy_attachment.ipv6_eks_cni_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy_attachment) | resource | | [aws_launch_template.default](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/launch_template) | resource | | [random_pet.cbd](https://registry.terraform.io/providers/hashicorp/random/latest/docs/resources/pet) | resource | +| [aws_ami.windows_ami](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/ami) | data source | | [aws_eks_cluster.this](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/eks_cluster) | data source | | [aws_iam_policy_document.assume_role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source | | [aws_iam_policy_document.ipv6_eks_cni_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source | @@ -363,8 +364,7 @@ https://docs.aws.amazon.com/eks/latest/userguide/windows-support.html | [additional\_tag\_map](#input\_additional\_tag\_map) | Additional key-value pairs to add to each map in `tags_as_list_of_maps`. Not added to `tags` or `id`.
This is for some rare cases where resources want additional configuration of tags
and therefore take a list of maps with tag key, value, and additional configuration. | `map(string)` | `{}` | no | | [after\_cluster\_joining\_userdata](#input\_after\_cluster\_joining\_userdata) | Additional `bash` commands to execute on each worker node after joining the EKS cluster (after executing the `bootstrap.sh` script). For more info, see https://kubedex.com/90-days-of-aws-eks-in-production | `list(string)` | `[]` | no | | [ami\_image\_id](#input\_ami\_image\_id) | AMI to use, overriding other AMI specifications, but must match `ami_type`. Ignored if `launch_template_id` is supplied. | `list(string)` | `[]` | no | -| [ami\_release\_version](#input\_ami\_release\_version) | OBSOLETE: Use `ami_specifier` instead. Note that it has a different format.
Historical description: EKS AMI version to use, e.g. For AL2 \"1.16.13-20200821\" or for bottlerocket \"1.2.0-ccf1b754\" (no \"v\") or for Windows \"2023.02.14\". For AL2, bottlerocket and Windows, it defaults to latest version for Kubernetes version." | `list(string)` | `[]` | no | -| [ami\_specifier](#input\_ami\_specifier) | OS-dependent specifier for one of the several AMIs that match OS, architecture, and Kubernetes 1.xx version.
If not specified the recommended/latest AMI for the given Kubernetes version will be used.
Unfortunately, the format of this value varies by OS, and we have not found documentation for it.
You can generally figure it out from the AMI name or description, and validate it by trying to retrieve
the SSM Public Parameter for the AMI ID.

Examples:
AL2: amazon-eks-node-1.29-v20240117
AL2023: amazon-eks-node-al2023-x86\_64-standard-1.29-v20240605
Bottlerocket: 1.20.1-7c3e9198 \_# Note: 1.20.1 is the Bottlerocket, not Kubernetes, version\_
Windows: | `string` | `"recommended"` | no | +| [ami\_release\_version](#input\_ami\_release\_version) | The EKS AMI "release version" to use. Defaults to the latest recommended version.
For Amazon Linux, it is the "Release version" from [Amazon AMI Releases](https://github.com/awslabs/amazon-eks-ami/releases)
For Bottlerocket, it is the release tag from [Bottlerocket Releases](https://github.com/bottlerocket-os/bottlerocket/releases) without the "v" prefix.
For Windows, it is "AMI version" from [AWS docs](https://docs.aws.amazon.com/eks/latest/userguide/eks-ami-versions-windows.html).
Note that unlike AMI names, release versions never include the "v" prefix.
Examples:
AL2: 1.29.3-20240531
Bottlerocket: 1.2.0 or 1.2.0-ccf1b754
Windows: 1.29-2024.04.09 | `list(string)` | `[]` | no | | [ami\_type](#input\_ami\_type) | Type of Amazon Machine Image (AMI) associated with the EKS Node Group.
Defaults to `AL2_x86_64`. Valid values: `AL2_x86_64, AL2_x86_64_GPU, AL2_ARM_64, CUSTOM, BOTTLEROCKET_ARM_64, BOTTLEROCKET_x86_64, BOTTLEROCKET_ARM_64_NVIDIA, BOTTLEROCKET_x86_64_NVIDIA, WINDOWS_CORE_2019_x86_64, WINDOWS_FULL_2019_x86_64, WINDOWS_CORE_2022_x86_64, WINDOWS_FULL_2022_x86_64, AL2023_x86_64_STANDARD, AL2023_ARM_64_STANDARD`. | `string` | `"AL2_x86_64"` | no | | [associate\_cluster\_security\_group](#input\_associate\_cluster\_security\_group) | When true, associate the default cluster security group to the nodes. If disabled the EKS managed security group will not
be associated to the nodes and you will need to provide another security group that allows the nodes to communicate with
the EKS control plane. Be aware that if no `associated_security_group_ids` or `ssh_access_security_group_ids` are provided,
then the nodes will have no inbound or outbound rules. | `bool` | `true` | no | | [associated\_security\_group\_ids](#input\_associated\_security\_group\_ids) | A list of IDs of Security Groups to associate the node group with, in addition to the EKS' created security group.
These security groups will not be modified. | `list(string)` | `[]` | no | @@ -432,7 +432,6 @@ https://docs.aws.amazon.com/eks/latest/userguide/windows-support.html | Name | Description | |------|-------------| -| [WARNING\_ami\_release\_version](#output\_WARNING\_ami\_release\_version) | Include the warning output message to quite the linter about unused variables. | | [WARNING\_cluster\_autoscaler\_enabled](#output\_WARNING\_cluster\_autoscaler\_enabled) | n/a | | [eks\_node\_group\_ami\_id](#output\_eks\_node\_group\_ami\_id) | The ID of the AMI used for the worker nodes, if specified | | [eks\_node\_group\_arn](#output\_eks\_node\_group\_arn) | Amazon Resource Name (ARN) of the EKS Node Group | diff --git a/ami.tf b/ami.tf index b3391e1..1daad84 100644 --- a/ami.tf +++ b/ami.tf @@ -18,8 +18,12 @@ locals { + given_ami_id = length(var.ami_image_id) > 0 + # Public SSM parameters all start with /aws/service/ + ami_os = split("_", var.ami_type)[0] + # format string that makes # format(fmt, specifier, k8s_version) the SSM parameter name to retrieve @@ -39,34 +43,74 @@ locals { WINDOWS_FULL_2022_x86_64 = "/aws/service/ami-windows-latest/Windows_Server-2022-English-Full-EKS_Optimized-%[2]v/image_id" } - # AMI specifiers? - # Specifiers for AL2 and AL2023 are AMI Name from https://github.com/awslabs/amazon-eks-ami/releases + release_version_parts = concat(split("-", try(var.ami_release_version[0], "")), ["", ""]) + amazon_linux_ami_name_release_part = try(join(".", slice(split(".", local.release_version_parts[0]), 0, 2)), "") + # AMI Public SSM Parameter specifiers? + # Release versions for AL2 and AL2023 are from https://github.com/awslabs/amazon-eks-ami/releases + # Amazon Linux Release Version: 1.29.0-20240213 # AL2 # AMI name: amazon-eks-node-1.29-v20240117 # AMI SSM param: /aws/service/eks/optimized-ami/1.29/amazon-linux-2/amazon-eks-node-1.29-v20240117/image_id # AL2023 - # AMI name: amazon-eks-node-al2023-arm64-standard-1.29-v20240605 - # AMI SSM param: /aws/service/eks/optimized-ami/1.29/amazon-linux-2023/x86_64/standard/amazon-eks-node-al2023-x86_64-standard-1.29-v20240605/image_id - # Specifiers for Bottlerocket are the bare release version (e.g. `1.20.0`) or - # the release version and the commit hash (e.g. `1.20.0-7c3e9198`) + # AMI name: amazon-eks-node-al2023-x86_64-standard-1.29-v20240213 + # AMI SSM param: /aws/service/eks/optimized-ami/1.29/amazon-linux-2023/x86_64/standard/amazon-eks-node-al2023-x86_64-standard-1.29-v20240213/image_id + # Specifiers for Bottlerocket are the bare release version (e.g. `1.18.0`) or + # the release version and the first 8 characters of the commit hash (e.g. `1.18.0-7452c37e`). NOTE: GitHub commit hash abbreviations are only 7 characters. + # From: # Bottlerocket: - # AMI name: bottlerocket-aws-k8s-1.26-nvidia-x86_64-v1.17.0-53f322c2 - # AMI SSM param: /aws/service/bottlerocket/aws-k8s-1.26-nvidia/x86_64/1.17.0/image_id # No "v" - # /aws/service/bottlerocket/aws-k8s-1.26-nvidia/x86_64/1.17.0--53f322c2/image_id - # Windows does not allow a specifier. - ami_specifier = var.ami_specifier == "recommended" && startswith(var.ami_type, "BOTTLEROCKET") ? "latest" : var.ami_specifier - - # Kubernetes version priority (first one to be set wins) - # 1. var.kubernetes_version - # 2. data.eks_cluster.this.kubernetes_version - use_cluster_kubernetes_version = local.enabled && length(var.kubernetes_version) == 0 - need_cluster_kubernetes_version = local.use_cluster_kubernetes_version - - resolved_kubernetes_version = local.use_cluster_kubernetes_version ? data.aws_eks_cluster.this[0].version : var.kubernetes_version[0] + # AMI name: bottlerocket-aws-k8s-1.29-nvidia-x86_64-v1.18.0-7452c37e + # AMI SSM param: /aws/service/bottlerocket/aws-k8s-1.26-nvidia/x86_64/1.18.0/image_id # No "v" + # /aws/service/bottlerocket/aws-k8s-1.26-nvidia/x86_64/1.18.0-7452c37e/image_id + # Windows does not allow a specifier for SSM parameters, they only have the latest AMI ID + ami_specifier_amazon_linux = { + AL2_x86_64 = format("amazon-eks-node-%v-v%v", local.amazon_linux_ami_name_release_part, local.release_version_parts[1]) + AL2_x86_64_GPU = format("amazon-eks-gpu-node-%v-v%v", local.amazon_linux_ami_name_release_part, local.release_version_parts[1]) + AL2_ARM_64 = format("amazon-eks-arm64-node-%v-v%v", local.amazon_linux_ami_name_release_part, local.release_version_parts[1]) + AL2023_x86_64_STANDARD = format("amazon-eks-node-al2023-x86_64-standard-%v-v%v", local.amazon_linux_ami_name_release_part, local.release_version_parts[1]) + AL2023_ARM_64_STANDARD = format("amazon-eks-node-al2023-arm64-standard-%v-v%v", local.amazon_linux_ami_name_release_part, local.release_version_parts[1]) + } + + ami_specifier = length(var.ami_release_version) == 0 ? (local.ami_os == "BOTTLEROCKET" ? "latest" : "recommended") : ( + lookup(local.ami_specifier_amazon_linux, var.ami_type, var.ami_release_version[0]) + ) + + # As usual, Windows is difficult. + is_window_version = local.ami_os == "WINDOWS" && local.ami_specifier != "recommended" + + windows_name_base = { + WINDOWS_CORE_2019_x86_64 = "Windows_Server-2019-English-Core-EKS_Optimized" + WINDOWS_FULL_2019_x86_64 = "Windows_Server-2019-English-Full-EKS_Optimized" + WINDOWS_CORE_2022_x86_64 = "Windows_Server-2022-English-Core-EKS_Optimized" + WINDOWS_FULL_2022_x86_64 = "Windows_Server-2022-English-Full-EKS_Optimized" + } + + # We do not really need to compute all the names, but it makes debugging easier if we do. + ami_name_windows = { for k, v in local.windows_name_base : k => format("%s-%s", v, try(var.ami_release_version[0], "")) } + + fetched_ami_id = try(local.is_window_version ? data.aws_ami.windows_ami[0].image_id : data.aws_ssm_parameter.ami_id[0].insecure_value, "") + ami_id = local.given_ami_id ? var.ami_image_id[0] : local.fetched_ami_id } data "aws_ssm_parameter" "ami_id" { - count = local.enabled && local.need_ami_id ? 1 : 0 + count = local.need_to_get_ami_id && !local.is_window_version ? 1 : 0 name = format(local.ami_ssm_format[var.ami_type], local.ami_specifier, local.resolved_kubernetes_version) + + lifecycle { + precondition { + condition = var.ami_type != "CUSTOM" + error_message = "The AMI ID must be supplied when AMI type is \"CUSTOM\"." + } + } +} + +data "aws_ami" "windows_ami" { + count = local.need_to_get_ami_id && local.is_window_version ? 1 : 0 + + owners = ["amazon"] + filter { + name = "name" + values = [local.ami_name_windows[var.ami_type]] + } } + diff --git a/docs/terraform.md b/docs/terraform.md index 51ff082..2f2e415 100644 --- a/docs/terraform.md +++ b/docs/terraform.md @@ -36,6 +36,7 @@ | [aws_iam_role_policy_attachment.ipv6_eks_cni_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy_attachment) | resource | | [aws_launch_template.default](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/launch_template) | resource | | [random_pet.cbd](https://registry.terraform.io/providers/hashicorp/random/latest/docs/resources/pet) | resource | +| [aws_ami.windows_ami](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/ami) | data source | | [aws_eks_cluster.this](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/eks_cluster) | data source | | [aws_iam_policy_document.assume_role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source | | [aws_iam_policy_document.ipv6_eks_cni_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source | @@ -50,8 +51,7 @@ | [additional\_tag\_map](#input\_additional\_tag\_map) | Additional key-value pairs to add to each map in `tags_as_list_of_maps`. Not added to `tags` or `id`.
This is for some rare cases where resources want additional configuration of tags
and therefore take a list of maps with tag key, value, and additional configuration. | `map(string)` | `{}` | no | | [after\_cluster\_joining\_userdata](#input\_after\_cluster\_joining\_userdata) | Additional `bash` commands to execute on each worker node after joining the EKS cluster (after executing the `bootstrap.sh` script). For more info, see https://kubedex.com/90-days-of-aws-eks-in-production | `list(string)` | `[]` | no | | [ami\_image\_id](#input\_ami\_image\_id) | AMI to use, overriding other AMI specifications, but must match `ami_type`. Ignored if `launch_template_id` is supplied. | `list(string)` | `[]` | no | -| [ami\_release\_version](#input\_ami\_release\_version) | OBSOLETE: Use `ami_specifier` instead. Note that it has a different format.
Historical description: EKS AMI version to use, e.g. For AL2 \"1.16.13-20200821\" or for bottlerocket \"1.2.0-ccf1b754\" (no \"v\") or for Windows \"2023.02.14\". For AL2, bottlerocket and Windows, it defaults to latest version for Kubernetes version." | `list(string)` | `[]` | no | -| [ami\_specifier](#input\_ami\_specifier) | OS-dependent specifier for one of the several AMIs that match OS, architecture, and Kubernetes 1.xx version.
If not specified the recommended/latest AMI for the given Kubernetes version will be used.
Unfortunately, the format of this value varies by OS, and we have not found documentation for it.
You can generally figure it out from the AMI name or description, and validate it by trying to retrieve
the SSM Public Parameter for the AMI ID.

Examples:
AL2: amazon-eks-node-1.29-v20240117
AL2023: amazon-eks-node-al2023-x86\_64-standard-1.29-v20240605
Bottlerocket: 1.20.1-7c3e9198 \_# Note: 1.20.1 is the Bottlerocket, not Kubernetes, version\_
Windows: | `string` | `"recommended"` | no | +| [ami\_release\_version](#input\_ami\_release\_version) | The EKS AMI "release version" to use. Defaults to the latest recommended version.
For Amazon Linux, it is the "Release version" from [Amazon AMI Releases](https://github.com/awslabs/amazon-eks-ami/releases)
For Bottlerocket, it is the release tag from [Bottlerocket Releases](https://github.com/bottlerocket-os/bottlerocket/releases) without the "v" prefix.
For Windows, it is "AMI version" from [AWS docs](https://docs.aws.amazon.com/eks/latest/userguide/eks-ami-versions-windows.html).
Note that unlike AMI names, release versions never include the "v" prefix.
Examples:
AL2: 1.29.3-20240531
Bottlerocket: 1.2.0 or 1.2.0-ccf1b754
Windows: 1.29-2024.04.09 | `list(string)` | `[]` | no | | [ami\_type](#input\_ami\_type) | Type of Amazon Machine Image (AMI) associated with the EKS Node Group.
Defaults to `AL2_x86_64`. Valid values: `AL2_x86_64, AL2_x86_64_GPU, AL2_ARM_64, CUSTOM, BOTTLEROCKET_ARM_64, BOTTLEROCKET_x86_64, BOTTLEROCKET_ARM_64_NVIDIA, BOTTLEROCKET_x86_64_NVIDIA, WINDOWS_CORE_2019_x86_64, WINDOWS_FULL_2019_x86_64, WINDOWS_CORE_2022_x86_64, WINDOWS_FULL_2022_x86_64, AL2023_x86_64_STANDARD, AL2023_ARM_64_STANDARD`. | `string` | `"AL2_x86_64"` | no | | [associate\_cluster\_security\_group](#input\_associate\_cluster\_security\_group) | When true, associate the default cluster security group to the nodes. If disabled the EKS managed security group will not
be associated to the nodes and you will need to provide another security group that allows the nodes to communicate with
the EKS control plane. Be aware that if no `associated_security_group_ids` or `ssh_access_security_group_ids` are provided,
then the nodes will have no inbound or outbound rules. | `bool` | `true` | no | | [associated\_security\_group\_ids](#input\_associated\_security\_group\_ids) | A list of IDs of Security Groups to associate the node group with, in addition to the EKS' created security group.
These security groups will not be modified. | `list(string)` | `[]` | no | @@ -119,7 +119,6 @@ | Name | Description | |------|-------------| -| [WARNING\_ami\_release\_version](#output\_WARNING\_ami\_release\_version) | Include the warning output message to quite the linter about unused variables. | | [WARNING\_cluster\_autoscaler\_enabled](#output\_WARNING\_cluster\_autoscaler\_enabled) | n/a | | [eks\_node\_group\_ami\_id](#output\_eks\_node\_group\_ami\_id) | The ID of the AMI used for the worker nodes, if specified | | [eks\_node\_group\_arn](#output\_eks\_node\_group\_arn) | Amazon Resource Name (ARN) of the EKS Node Group | diff --git a/examples/complete/fixtures.us-east-2.tfvars b/examples/complete/fixtures.us-east-2.tfvars index ea68471..e6c5eb9 100644 --- a/examples/complete/fixtures.us-east-2.tfvars +++ b/examples/complete/fixtures.us-east-2.tfvars @@ -10,7 +10,17 @@ stage = "test" name = "eks-node-group" +# Keep Kubernetes version in sync with k8s.io packages in test/src/go.mod kubernetes_version = "1.29" +# Keep the AMI release version in sync with the Kubernetes version +# Get Release Version from https://github.com/awslabs/amazon-eks-ami/releases +# but DO NOT USE THE LATEST VERSION. Use the one before that. +ami_release_version = ["1.29.3-20240531"] + +# Use the same architecture for the instance type and the AMI type +instance_types = ["t4g.small"] +ami_type = "AL2023_ARM_64_STANDARD" + oidc_provider_enabled = true @@ -18,11 +28,10 @@ enabled_cluster_log_types = ["audit"] cluster_log_retention_period = 7 -instance_types = ["t3.small"] desired_size = 2 -max_size = 3 +max_size = 2 min_size = 2 @@ -47,4 +56,3 @@ kubernetes_taints = [ effect = "PREFER_NO_SCHEDULE" }] -ami_type = "AL2023_x86_64_STANDARD" diff --git a/examples/complete/main.tf b/examples/complete/main.tf index 506a5e6..44bb700 100644 --- a/examples/complete/main.tf +++ b/examples/complete/main.tf @@ -167,15 +167,13 @@ module "eks_node_group" { node_role_policy_arns = [local.extra_policy_arn] update_config = var.update_config + ami_type = var.ami_type + ami_release_version = var.ami_release_version - ami_type = var.ami_type - ami_specifier = var.ami_specifier - - /* before_cluster_joining_userdata = var.before_cluster_joining_userdata - kubelet_additional_options = var.kubelet_additional_options - after_cluster_joining_userdata = var.after_cluster_joining_userdata - */ + kubelet_additional_options = var.kubelet_additional_options + after_cluster_joining_userdata = var.after_cluster_joining_userdata + create_before_destroy = true @@ -189,3 +187,31 @@ module "eks_node_group" { context = module.this.context } + +module "eks_node_group_minimal" { + source = "../../" + + # We need to do something to avoid a name clash with the Node Role. + # Easiest thing to do is reuse the node role created by the other node group. + node_role_arn = [module.eks_node_group.eks_node_group_role_arn] + + subnet_ids = module.this.enabled ? module.subnets.public_subnet_ids : ["filler_string_for_enabled_is_false"] + cluster_name = module.this.enabled ? module.eks_cluster.eks_cluster_id : "disabled" + instance_types = var.instance_types + desired_size = var.desired_size + min_size = var.min_size + max_size = var.max_size + kubernetes_version = [var.kubernetes_version] + + ami_type = var.ami_type + ami_release_version = var.ami_release_version + + node_group_terraform_timeouts = [{ + create = "15m" + delete = "20m" + }] + + context = module.this.context +} + + diff --git a/examples/complete/variables.tf b/examples/complete/variables.tf index ff4fb66..974d0a3 100644 --- a/examples/complete/variables.tf +++ b/examples/complete/variables.tf @@ -129,18 +129,20 @@ variable "ami_type" { } } -variable "ami_specifier" { - type = string +variable "ami_release_version" { + type = list(string) description = <<-EOT - OS-dependent specifier for one of the several AMIs that match OS, architecture, and Kubernetes 1.xx version. - If not specified the recommended/latest AMI for the given Kubernetes version will be used. + The EKS AMI "release version" to use. Defaults to the latest recommended version. + For Amazon Linux, get it from [Amazon AMI Releases](https://github.com/awslabs/amazon-eks-ami/releases) + For Bottlerocket, get it from [Bottlerocket Releases](https://github.com/bottlerocket-os/bottlerocket/releases). + For Windows, get it from [AWS docs](https://docs.aws.amazon.com/eks/latest/userguide/eks-ami-versions-windows.html). + Note that unlike AMI names, release versions never include the "v" prefix. Examples: - AL2: amazon-eks-node-1.29-v20240117 - AL2023: amazon-eks-node-al2023-x86_64-standard-1.29-v20240605 - Bottlerocket: 1.20.1-7c3e9198 - Windows: + AL2: 1.29.3-20240531 + Bottlerocket: 1.2.0 or 1.2.0-ccf1b754 + Windows: 1.29-2024.04.09 EOT - default = "recommended" + default = [] nullable = false } diff --git a/launch-template.tf b/launch-template.tf index 632800d..3af389b 100644 --- a/launch-template.tf +++ b/launch-template.tf @@ -31,7 +31,7 @@ locals { local.fetch_launch_template ? data.aws_launch_template.this[0].latest_version : aws_launch_template.default[0].latest_version )) : null - launch_template_ami = length(var.ami_image_id) == 0 ? (local.generate_launch_template ? data.aws_ssm_parameter.ami_id[0].insecure_value : "") : var.ami_image_id[0] + launch_template_ami = local.ami_id associate_cluster_security_group = local.enabled && var.associate_cluster_security_group launch_template_vpc_security_group_ids = sort(compact(concat( diff --git a/main.tf b/main.tf index 0575a1f..6d5c0d2 100644 --- a/main.tf +++ b/main.tf @@ -1,9 +1,19 @@ locals { enabled = module.this.enabled + # Kubernetes version priority (first one to be set wins) + # 1. var.kubernetes_version + # 2. data.eks_cluster.this.kubernetes_version + use_cluster_kubernetes_version = local.enabled && length(var.kubernetes_version) == 0 + need_cluster_kubernetes_version = local.use_cluster_kubernetes_version + resolved_kubernetes_version = local.use_cluster_kubernetes_version ? data.aws_eks_cluster.this[0].version : var.kubernetes_version[0] + + # By default (var.immediately_apply_lt_changes is null), apply changes immediately only if create_before_destroy is true. immediately_apply_lt_changes = coalesce(var.immediately_apply_lt_changes, var.create_before_destroy) - need_ami_id = local.enabled && length(var.ami_image_id) == 0 + # See https://aws.amazon.com/blogs/containers/introducing-launch-template-and-custom-ami-support-in-amazon-eks-managed-node-groups/ + features_require_ami = local.enabled && local.suppress_bootstrap + need_to_get_ami_id = local.enabled && local.features_require_ami && !local.given_ami_id have_ssh_key = local.enabled && length(var.ec2_ssh_key_name) == 1 ec2_ssh_key_name = local.have_ssh_key ? var.ec2_ssh_key_name[0] : null @@ -13,27 +23,22 @@ locals { get_cluster_data = local.enabled ? ( local.need_cluster_kubernetes_version || local.associate_cluster_security_group || - local.need_bootstrap || - local.need_ssh_access_sg || - (length(local.kubelet_extra_args) > 0 && local.ami_os == "AL2023") + local.need_ssh_access_sg ) : false - - - # At the moment, the autoscaler tags are not needed. - # We leave them here for when they can be applied to the autoscaling group. - /* - # - # Set up tags for autoscaler and other resources - # https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md#auto-discovery-setup - # - taint_effect_map = { NO_SCHEDULE = "NoSchedule" NO_EXECUTE = "NoExecute" PREFER_NO_SCHEDULE = "PreferNoSchedule" } + # + # Set up tags for autoscaler and other resources + # https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md#auto-discovery-setup + # + # At the moment, the autoscaler tags are not needed. + # We leave them here for when they can be applied to the autoscaling group. + /* autoscaler_enabled_tags = { "k8s.io/cluster-autoscaler/${var.cluster_name}" = "owned" "k8s.io/cluster-autoscaler/enabled" = "true" @@ -45,7 +50,7 @@ locals { for taint in var.kubernetes_taints : format("k8s.io/cluster-autoscaler/node-template/taint/%v", taint.key) => "${taint.value == null ? "" : taint.value}:${local.taint_effect_map[taint.effect]}" } - */ + node_tags = merge( module.label.tags, @@ -54,9 +59,13 @@ locals { "kubernetes.io/cluster/${var.cluster_name}" = "owned" } ) + # It does not help to add the autoscaler tags to the node group tags, # because they only matter when applied to the autoscaling group. node_group_tags = local.node_tags + */ + node_tags = module.label.tags + node_group_tags = module.label.tags } module "label" { @@ -83,10 +92,12 @@ locals { # Always supply instance types via the node group, not the launch template, # because node group supports up to 20 types but launch template does not. # See https://docs.aws.amazon.com/eks/latest/APIReference/API_CreateNodegroup.html#API_CreateNodegroup_RequestSyntax - instance_types = var.instance_types - ami_type = local.launch_template_ami == "" ? var.ami_type : null - capacity_type = var.capacity_type - labels = var.kubernetes_labels == null ? {} : var.kubernetes_labels + instance_types = var.instance_types + ami_type = local.launch_template_ami == "" ? var.ami_type : null + version = local.launch_template_ami == "" ? local.resolved_kubernetes_version : null + release_version = local.launch_template_ami == "" && length(var.ami_release_version) > 0 ? var.ami_release_version[0] : null + capacity_type = var.capacity_type + labels = var.kubernetes_labels == null ? {} : var.kubernetes_labels taints = var.kubernetes_taints @@ -151,7 +162,8 @@ resource "aws_eks_node_group" "default" { instance_types = local.ng.instance_types ami_type = local.ng.ami_type labels = local.ng.labels - version = null # derived from AMI + version = local.ng.version + release_version = local.ng.release_version force_update_version = local.ng.force_update_version capacity_type = local.ng.capacity_type @@ -230,7 +242,8 @@ resource "aws_eks_node_group" "cbd" { instance_types = local.ng.instance_types ami_type = local.ng.ami_type labels = local.ng.labels - version = null # derived from AMI + version = local.ng.version + release_version = local.ng.release_version force_update_version = local.ng.force_update_version capacity_type = local.ng.capacity_type diff --git a/test/src/default_test.go b/test/src/default_test.go new file mode 100644 index 0000000..a000334 --- /dev/null +++ b/test/src/default_test.go @@ -0,0 +1,32 @@ +package test + +import ( + "github.com/gruntwork-io/terratest/modules/terraform" + "github.com/stretchr/testify/assert" + "regexp" + "testing" +) + +// Test the Terraform module in examples/complete using Terratest. +func TestExamplesComplete(t *testing.T) { + t.Parallel() + + testRunner(t, nil, testExamplesComplete) +} + +func TestExamplesCompleteDisabled(t *testing.T) { + t.Parallel() + + vars := &map[string]interface{}{ + "enabled": false, + } + testRunner(t, vars, testExamplesCompleteDisabled) +} + +func testExamplesCompleteDisabled(t *testing.T, terraformOptions *terraform.Options, randID string, results string) { + // Should complete successfully without creating or changing any resources. + // Extract the "Resources:" section of the output to make the error message more readable. + re := regexp.MustCompile(`Resources: [^.]+\.`) + match := re.FindString(results) + assert.Equal(t, "Resources: 0 added, 0 changed, 0 destroyed.", match, "Re-applying the same configuration should not change any resources") +} diff --git a/test/src/examples_complete_test.go b/test/src/examples_complete_test.go index 3c75e04..06a0ea8 100644 --- a/test/src/examples_complete_test.go +++ b/test/src/examples_complete_test.go @@ -1,96 +1,27 @@ package test import ( - "encoding/base64" "fmt" + "github.com/gruntwork-io/terratest/modules/logger" "os" - "regexp" "strings" + "sync" "sync/atomic" "testing" "time" - "github.com/gruntwork-io/terratest/modules/random" "github.com/gruntwork-io/terratest/modules/terraform" - testStructure "github.com/gruntwork-io/terratest/modules/test-structure" "github.com/stretchr/testify/assert" - corev1 "k8s.io/api/core/v1" - "k8s.io/client-go/informers" - "k8s.io/client-go/kubernetes" - "k8s.io/client-go/rest" - "k8s.io/client-go/tools/cache" - "sigs.k8s.io/aws-iam-authenticator/pkg/token" - "github.com/aws/aws-sdk-go/aws" "github.com/aws/aws-sdk-go/aws/session" "github.com/aws/aws-sdk-go/service/eks" + corev1 "k8s.io/api/core/v1" + "k8s.io/client-go/informers" + "k8s.io/client-go/tools/cache" ) -func newClientset(cluster *eks.Cluster) (*kubernetes.Clientset, error) { - gen, err := token.NewGenerator(true, false) - if err != nil { - return nil, err - } - opts := &token.GetTokenOptions{ - ClusterID: aws.StringValue(cluster.Name), - } - tok, err := gen.GetWithOptions(opts) - if err != nil { - return nil, err - } - ca, err := base64.StdEncoding.DecodeString(aws.StringValue(cluster.CertificateAuthority.Data)) - if err != nil { - return nil, err - } - clientset, err := kubernetes.NewForConfig( - &rest.Config{ - Host: aws.StringValue(cluster.Endpoint), - BearerToken: tok.Token, - TLSClientConfig: rest.TLSClientConfig{ - CAData: ca, - }, - }, - ) - if err != nil { - return nil, err - } - return clientset, nil -} - -func cleanup(t *testing.T, terraformOptions *terraform.Options, tempTestFolder string) { - terraform.Destroy(t, terraformOptions) - _ = os.RemoveAll(tempTestFolder) -} - -// Test the Terraform module in examples/complete using Terratest. -func TestExamplesComplete(t *testing.T) { - t.Parallel() - randID := strings.ToLower(random.UniqueId()) - attributes := []string{randID} - - rootFolder := "../../" - terraformFolderRelativeToRoot := "examples/complete" - varFiles := []string{"fixtures.us-east-2.tfvars"} - - tempTestFolder := testStructure.CopyTerraformFolderToTemp(t, rootFolder, terraformFolderRelativeToRoot) - - terraformOptions := &terraform.Options{ - // The path to where our Terraform code is located - TerraformDir: tempTestFolder, - Upgrade: true, - // Variables to pass to our Terraform code using -var-file options - VarFiles: varFiles, - Vars: map[string]interface{}{ - "attributes": attributes, - }, - } - - // At the end of the test, run `terraform destroy` to clean up any resources that were created - defer cleanup(t, terraformOptions, tempTestFolder) - - // This will run `terraform init` and `terraform apply` and fail the test if there are any errors - terraform.InitAndApply(t, terraformOptions) +func testExamplesComplete(t *testing.T, terraformOptions *terraform.Options, randID string, _ string) { // Run `terraform output` to get the value of an output variable vpcCidr := terraform.Output(t, terraformOptions, "vpc_cidr") @@ -151,28 +82,44 @@ func TestExamplesComplete(t *testing.T) { } result, err := eksSvc.DescribeCluster(input) - assert.NoError(t, err) + if !assert.NoError(t, err) { + t.Fatal("Unable to find the EKS cluster, skipping any further tests") + } clientset, err := newClientset(result.Cluster) - assert.NoError(t, err) + if !assert.NoError(t, err) { + t.Fatal("Unable to create a client for the EKS cluster, skipping any further tests") + } factory := informers.NewSharedInformerFactory(clientset, 0) informer := factory.Core().V1().Nodes().Informer() stopChannel := make(chan struct{}) var countOfWorkerNodes uint64 = 0 + var expectedCountOfWorkerNodes uint64 = 4 + var allWorkerNodesJoined bool = false + if !assert.NotNil(t, informer, "Unable to create a node informer") { + t.Fatal("Unable to create a node informer, skipping any further tests") + } informer.AddEventHandler(cache.ResourceEventHandlerFuncs{ AddFunc: func(obj interface{}) { node := obj.(*corev1.Node) fmt.Printf("Worker Node %s has joined the EKS cluster at %s\n", node.Name, node.CreationTimestamp) atomic.AddUint64(&countOfWorkerNodes, 1) - if countOfWorkerNodes > 1 { + if countOfWorkerNodes >= expectedCountOfWorkerNodes { + allWorkerNodesJoined = true close(stopChannel) } }, }) - go informer.Run(stopChannel) + var wg sync.WaitGroup + wg.Add(1) // We're waiting for one goroutine (the informer) + + go func() { + informer.Run(stopChannel) + wg.Done() // Call Done on the WaitGroup when the informer is finished + }() select { case <-stopChannel: @@ -183,40 +130,56 @@ func TestExamplesComplete(t *testing.T) { fmt.Println(msg) assert.Fail(t, msg) } + + wg.Wait() // Wait for all goroutines to finish + + if !allWorkerNodesJoined { + return + } + + hasLabel := checkSomeNodeHasLabel(clientset, "terratest", "true") + assert.True(t, hasLabel, "No node with label terratest=true found in the cluster") + + hasLabel = checkSomeNodeHasLabel(clientset, "attributes", randID) + assert.True(t, hasLabel, "No node with label attributes=%s found in the cluster", randID) + + hasTaint := checkSomeNodeHasTaint(clientset, "test", "", corev1.TaintEffectPreferNoSchedule) + assert.True(t, hasTaint, "No node with taint test=:PreferNoSchedule found in the cluster") + } -func TestExamplesCompleteDisabled(t *testing.T) { - t.Parallel() - randID := strings.ToLower(random.UniqueId()) +// To speed up debugging, allow running the tests on an existing cluster, +// without creating and destroying one. +// Run this manually by creating a cluster in examples/complete with: +// +// export EXISTING_CLUSTER_ATTRIBUTE="" +// terraform apply -var-file fixtures.us-east-2.tfvars -var "attributes=[\"$EXISTING_CLUSTER_ATTRIBUTE\"]" +func Test_ExistingCluster(t *testing.T) { + randID := strings.ToLower(os.Getenv("EXISTING_CLUSTER_ATTRIBUTE")) + if randID == "" { + t.Skip("(This is normal): EXISTING_CLUSTER_ATTRIBUTE is not set, skipping...") + return + } + attributes := []string{randID} - rootFolder := "../../" - terraformFolderRelativeToRoot := "examples/complete" varFiles := []string{"fixtures.us-east-2.tfvars"} - tempTestFolder := testStructure.CopyTerraformFolderToTemp(t, rootFolder, terraformFolderRelativeToRoot) - terraformOptions := &terraform.Options{ // The path to where our Terraform code is located - TerraformDir: tempTestFolder, + TerraformDir: "../../examples/complete", Upgrade: true, // Variables to pass to our Terraform code using -var-file options VarFiles: varFiles, Vars: map[string]interface{}{ "attributes": attributes, - "enabled": false, }, } - // At the end of the test, run `terraform destroy` to clean up any resources that were created - defer cleanup(t, terraformOptions, tempTestFolder) - - // This will run `terraform init` and `terraform apply` and fail the test if there are any errors - results := terraform.InitAndApply(t, terraformOptions) + // Keep the output quiet + if !testing.Verbose() { + terraformOptions.Logger = logger.Discard + } - // Should complete successfully without creating or changing any resources. - // Extract the "Resources:" section of the output to make the error message more readable. - re := regexp.MustCompile(`Resources: [^.]+\.`) - match := re.FindString(results) - assert.Equal(t, "Resources: 0 added, 0 changed, 0 destroyed.", match, "Re-applying the same configuration should not change any resources") + testExamplesComplete(t, terraformOptions, randID, "") } diff --git a/test/src/framework_test.go b/test/src/framework_test.go new file mode 100644 index 0000000..f81b111 --- /dev/null +++ b/test/src/framework_test.go @@ -0,0 +1,130 @@ +package test + +import ( + "context" + "encoding/base64" + "github.com/gruntwork-io/terratest/modules/logger" + "github.com/gruntwork-io/terratest/modules/random" + "github.com/gruntwork-io/terratest/modules/terraform" + testStructure "github.com/gruntwork-io/terratest/modules/test-structure" + corev1 "k8s.io/api/core/v1" + metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" + "k8s.io/client-go/kubernetes" + "k8s.io/client-go/rest" + "os" + "sigs.k8s.io/aws-iam-authenticator/pkg/token" + "strings" + "testing" + + "github.com/aws/aws-sdk-go/aws" + "github.com/aws/aws-sdk-go/service/eks" +) + +// Test the Terraform module in examples/complete using Terratest. +func testRunner(t *testing.T, vars *map[string]interface{}, testFunc func(t *testing.T, terraformOptions *terraform.Options, randID string, results string)) { + randID := strings.ToLower(random.UniqueId()) + attributes := []string{randID} + + rootFolder := "../../" + terraformFolderRelativeToRoot := "examples/complete" + varFiles := []string{"fixtures.us-east-2.tfvars"} + + tempTestFolder := testStructure.CopyTerraformFolderToTemp(t, rootFolder, terraformFolderRelativeToRoot) + + if vars == nil { + vars = &map[string]interface{}{ + "attributes": attributes, + } + } else { + (*vars)["attributes"] = attributes + } + + terraformOptions := &terraform.Options{ + // The path to where our Terraform code is located + TerraformDir: tempTestFolder, + Upgrade: true, + // Variables to pass to our Terraform code using -var-file options + VarFiles: varFiles, + Vars: *vars, + } + + // Keep the output quiet + if !testing.Verbose() { + terraformOptions.Logger = logger.Discard + } + + // At the end of the test, run `terraform destroy` to clean up any resources that were created + defer cleanup(t, terraformOptions, tempTestFolder) + + // This will run `terraform init` and `terraform apply` and fail the test if there are any errors + results := terraform.InitAndApply(t, terraformOptions) + + testFunc(t, terraformOptions, randID, results) +} + +// EKS support +func newClientset(cluster *eks.Cluster) (*kubernetes.Clientset, error) { + gen, err := token.NewGenerator(true, false) + if err != nil { + return nil, err + } + opts := &token.GetTokenOptions{ + ClusterID: aws.StringValue(cluster.Name), + } + tok, err := gen.GetWithOptions(opts) + if err != nil { + return nil, err + } + ca, err := base64.StdEncoding.DecodeString(aws.StringValue(cluster.CertificateAuthority.Data)) + if err != nil { + return nil, err + } + clientset, err := kubernetes.NewForConfig( + &rest.Config{ + Host: aws.StringValue(cluster.Endpoint), + BearerToken: tok.Token, + TLSClientConfig: rest.TLSClientConfig{ + CAData: ca, + }, + }, + ) + if err != nil { + return nil, err + } + return clientset, nil +} + +// Check that at least one Node has the given label +func checkSomeNodeHasLabel(clientset *kubernetes.Clientset, labelKey string, labelValue string) bool { + nodes, err := clientset.CoreV1().Nodes().List(context.Background(), metav1.ListOptions{}) + if err != nil { + panic(err.Error()) + } + for _, node := range nodes.Items { + if value, ok := node.Labels[labelKey]; ok && value == labelValue { + return true + } + } + return false +} + +// Check that at least one Node has the given taint +func checkSomeNodeHasTaint(clientset *kubernetes.Clientset, taintKey string, taintValue string, taintEffect corev1.TaintEffect) bool { + nodes, err := clientset.CoreV1().Nodes().List(context.Background(), metav1.ListOptions{}) + if err != nil { + panic(err.Error()) + } + for _, node := range nodes.Items { + for _, taint := range node.Spec.Taints { + if taint.Key == taintKey && taint.Value == taintValue && taint.Effect == taintEffect { + return true + } + } + } + return false +} + +func cleanup(t *testing.T, terraformOptions *terraform.Options, tempTestFolder string) { + terraform.Destroy(t, terraformOptions) + _ = os.RemoveAll(tempTestFolder) +} diff --git a/test/src/go.mod b/test/src/go.mod index 7e82346..41de163 100644 --- a/test/src/go.mod +++ b/test/src/go.mod @@ -9,6 +9,7 @@ require ( github.com/gruntwork-io/terratest v0.42.0 github.com/stretchr/testify v1.8.4 k8s.io/api v0.29.4 + k8s.io/apimachinery v0.29.4 k8s.io/client-go v0.29.4 sigs.k8s.io/aws-iam-authenticator v0.6.7 ) @@ -103,7 +104,6 @@ require ( gopkg.in/inf.v0 v0.9.1 // indirect gopkg.in/yaml.v2 v2.4.0 // indirect gopkg.in/yaml.v3 v3.0.1 // indirect - k8s.io/apimachinery v0.29.4 // indirect k8s.io/klog/v2 v2.110.1 // indirect k8s.io/kube-openapi v0.0.0-20231010175941-2dd684a91f00 // indirect k8s.io/utils v0.0.0-20230726121419-3b25d923346b // indirect diff --git a/userdata.tf b/userdata.tf index d76550e..6d971eb 100644 --- a/userdata.tf +++ b/userdata.tf @@ -30,8 +30,16 @@ # locals { + # We need to suppress the EKS-supplied bootstrap if and only if we are running bootstrap.sh ourselves. + # We need to run bootstrap.sh ourselves if: + # - We are running Amazon Linux 2 or Windows (the other OSes do not use bootstrap.sh) and either: + # - We explicitly are given extra args for bootstrap via bootstrap_additional_options or + # - We are given extra args for kubelet via kubelet_additional_options, which are passed to bootstrap.sh + + suppress_bootstrap = local.enabled && (local.ami_os == "AL2" || local.ami_os == "WINDOWS") ? ( + length(var.bootstrap_additional_options) > 0 || length(var.kubelet_additional_options) > 0 + ) : false - ami_os = split("_", var.ami_type)[0] userdata_template_file = { AL2 = "${path.module}/userdata.tpl" AL2023 = "${path.module}/userdata_al2023.tpl" @@ -39,10 +47,24 @@ locals { WINDOWS = "${path.module}/userdata_nt.tpl" } - kubelet_extra_args = join(" ", var.kubelet_additional_options) + + # When suppressing EKS bootstrap, add --register-with-taints to kubelet_extra_args, + # e.g. --register-with-taints=test=:PreferNoSchedule + kubernetes_taint_argv = [ + for taint in var.kubernetes_taints : + "${taint.key}=${taint.value == null ? "" : taint.value}:${local.taint_effect_map[taint.effect]}" + ] + kubernetes_taint_arg = (local.suppress_bootstrap && length(var.kubernetes_taints) > 0 && + # Do not add to or override --register-with-taints if it is already set + !strcontains(local.kubelet_explicit_extra_args, "--register-with-taints=")) ? ( + " --register-with-taints=${join(",", local.kubernetes_taint_argv)}" + ) : "" # We use '>-' to handle quoting and escaping values in the YAML. + kubelet_explicit_extra_args = join(" ", var.kubelet_additional_options) + kubelet_extra_args = "${local.kubelet_explicit_extra_args}${local.kubernetes_taint_arg}" + kubelet_extra_args_yaml = replace(local.kubelet_extra_args, "--", "\n - >-\n --") userdata_vars = { @@ -52,6 +74,13 @@ locals { bootstrap_extra_args = length(var.bootstrap_additional_options) == 0 ? "" : join(" ", var.bootstrap_additional_options) after_cluster_joining_userdata = length(var.after_cluster_joining_userdata) == 0 ? "" : join("\n", var.after_cluster_joining_userdata) + /* It turns out we never need this, because we only use it when we are suppressing the EKS bootstrap, + * and we only suppress the EKS bootstrap when we are running bootstrap.sh ourselves, which only happens + * when we are running Amazon Linux 2 or Windows. This would only be used by a CUSTOM AMI, + * but for a CUSTOM AMI, the user must provide the full userdata. + * + * Keeping this here for now in case we need it in the future. + cluster_endpoint = local.get_cluster_data ? data.aws_eks_cluster.this[0].endpoint : null certificate_authority_data = local.get_cluster_data ? data.aws_eks_cluster.this[0].certificate_authority[0].data : null cluster_name = local.get_cluster_data ? data.aws_eks_cluster.this[0].name : null @@ -60,17 +89,18 @@ locals { [for net in data.aws_eks_cluster.this[0].kubernetes_network_config : net.service_ipv4_cidr if net.ip_family == "ipv4"], [for net in data.aws_eks_cluster.this[0].kubernetes_network_config : net.service_ipv6_cidr if net.ip_family == "ipv6"] )...) : null + */ } - need_bootstrap = local.enabled ? length(concat(var.kubelet_additional_options, - var.bootstrap_additional_options, var.after_cluster_joining_userdata - )) > 0 : false - # If var.userdata_override_base64[0] is present then we use it rather than generating userdata - need_userdata = local.enabled && length(var.userdata_override_base64) == 0 ? ( - (length(var.before_cluster_joining_userdata) > 0) || local.need_bootstrap) : false + generate_userdata = local.enabled && length(var.userdata_override_base64) == 0 ? ( + length(var.before_cluster_joining_userdata) > 0 || + length(var.kubelet_additional_options) > 0 || + length(var.bootstrap_additional_options) > 0 || + length(var.after_cluster_joining_userdata) > 0 + ) : false - userdata = local.need_userdata ? ( + userdata = local.generate_userdata ? ( base64encode( templatefile(local.userdata_template_file[local.ami_os], local.userdata_vars)) ) : ( diff --git a/userdata_al2023.tpl b/userdata_al2023.tpl index f177f68..8841d1e 100644 --- a/userdata_al2023.tpl +++ b/userdata_al2023.tpl @@ -1,8 +1,8 @@ MIME-Version: 1.0 -Content-Type: multipart/mixed; boundary="/:/+++" +Content-Type: multipart/mixed; boundary="//" %{ if length(before_cluster_joining_userdata) > 0 ~} ---/:/+++ +--// Content-Type: text/x-shellscript; charset="us-ascii" #!/bin/bash @@ -11,21 +11,16 @@ ${before_cluster_joining_userdata} %{ endif ~} %{~ if length(kubelet_extra_args_yaml) > 0 } ---/:/+++ +--// Content-Type: application/node.eks.aws --- apiVersion: node.eks.aws/v1alpha1 kind: NodeConfig spec: - cluster: - name: ${cluster_name} - apiServerEndpoint: ${cluster_endpoint} - certificateAuthority: ${certificate_authority_data} - cidr: ${cluster_cidr} kubelet: flags: ${kubelet_extra_args_yaml} %{~ endif } ---/:/+++-- +--//-- diff --git a/variables-deprecated.tf b/variables-deprecated.tf index 4ab48b7..54354e7 100644 --- a/variables-deprecated.tf +++ b/variables-deprecated.tf @@ -1,22 +1,3 @@ -variable "ami_release_version" { - type = list(string) - description = <<-EOT - OBSOLETE: Use `ami_specifier` instead. Note that it has a different format. - Historical description: EKS AMI version to use, e.g. For AL2 \"1.16.13-20200821\" or for bottlerocket \"1.2.0-ccf1b754\" (no \"v\") or for Windows \"2023.02.14\". For AL2, bottlerocket and Windows, it defaults to latest version for Kubernetes version." - EOT - default = [] - nullable = false - validation { - condition = length(var.ami_release_version) == 0 - error_message = "variable `ami_release_version` is obsolete. Use `ami_specifier` instead." - } -} - -# Include the warning output message to quite the linter about unused variables. -output "WARNING_ami_release_version" { - value = length(var.ami_release_version) == 0 ? null : "WARNING: variable `ami_release_version` is obsolete and has been ignored." -} - variable "cluster_autoscaler_enabled" { type = bool description = <<-EOT diff --git a/variables.tf b/variables.tf index fe6b285..32a9366 100644 --- a/variables.tf +++ b/variables.tf @@ -175,26 +175,39 @@ variable "ami_image_id" { } } -variable "ami_specifier" { - type = string +variable "ami_release_version" { + type = list(string) description = <<-EOT - OS-dependent specifier for one of the several AMIs that match OS, architecture, and Kubernetes 1.xx version. - If not specified the recommended/latest AMI for the given Kubernetes version will be used. - Unfortunately, the format of this value varies by OS, and we have not found documentation for it. - You can generally figure it out from the AMI name or description, and validate it by trying to retrieve - the SSM Public Parameter for the AMI ID. - + The EKS AMI "release version" to use. Defaults to the latest recommended version. + For Amazon Linux, it is the "Release version" from [Amazon AMI Releases](https://github.com/awslabs/amazon-eks-ami/releases) + For Bottlerocket, it is the release tag from [Bottlerocket Releases](https://github.com/bottlerocket-os/bottlerocket/releases) without the "v" prefix. + For Windows, it is "AMI version" from [AWS docs](https://docs.aws.amazon.com/eks/latest/userguide/eks-ami-versions-windows.html). + Note that unlike AMI names, release versions never include the "v" prefix. Examples: - AL2: amazon-eks-node-1.29-v20240117 - AL2023: amazon-eks-node-al2023-x86_64-standard-1.29-v20240605 - Bottlerocket: 1.20.1-7c3e9198 _# Note: 1.20.1 is the Bottlerocket, not Kubernetes, version_ - Windows: + AL2: 1.29.3-20240531 + Bottlerocket: 1.2.0 or 1.2.0-ccf1b754 + Windows: 1.29-2024.04.09 EOT - default = "recommended" - nullable = false + # Normally we would not validate this input and instead allow the AWS API to validate it, + # but in this case, our AMI selection logic depends on it being in a format we expect, + # so even if AWS adds options in the future, we need to ensure it is in a format we can handle. + validation { + condition = ( + length(var.ami_release_version) == 0 ? true : length( + # 1.2.3 with optional -20240531 or -7452c37e or 1.2.3 or 1.2-2024.04.09 + regexall("(^\\d+\\.\\d+\\.\\d+(-[\\da-f]{8})?$)|(^\\d+\\.\\d+\\.\\d+$)|(^\\d+\\.\\d+-\\d+\\.\\d+\\.\\d+$)", var.ami_release_version[0])) == 1 + ) + error_message = <<-EOT + Var ami_release_version, if supplied, must be like + Amazon Linux 2 or 2023: 1.29.3-20240531 + Bottlerocket: 1.18.0 or 1.18.0-7452c37e # note commit hash prefix is 8 characters, not GitHub's default 7 + Windows: 1.29-2024.04.09 + EOT + } + default = [] + nullable = false } - variable "instance_types" { type = list(string) description = <<-EOT