Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add config option to enable the encryption of AWS EKS secrets #2788

Merged
merged 20 commits into from
Nov 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
e76058b
Add config option (amazon_web_services.eks_kms_arn) to specify KMS-ke…
joneszc Oct 22, 2024
319f947
Merge branch 'nebari-dev:main' into eks-cluster-encryption
joneszc Oct 23, 2024
dcfd836
Merge branch 'nebari-dev:main' into eks-cluster-encryption
joneszc Oct 23, 2024
60d2a96
Merge branch 'nebari-dev:main' into eks-cluster-encryption
joneszc Oct 24, 2024
36518be
add aws kms_key_arns mock values to the pytest fixture
joneszc Oct 24, 2024
567a05d
add comments linking boto3 docs for kms_key_arns function
joneszc Oct 24, 2024
8374ada
Merge branch 'nebari-dev:main' into eks-cluster-encryption
joneszc Oct 25, 2024
a4d89cb
make kms_key_arns function more readable
joneszc Oct 25, 2024
c3cc413
refactor validation conditionals of aws kms key for eks and add check…
joneszc Oct 28, 2024
b204146
edit value error message on added check for customer-managed aws kms key
joneszc Oct 28, 2024
a2ff67c
Merge branch 'nebari-dev:main' into eks-cluster-encryption
joneszc Oct 28, 2024
3c309d5
Merge branch 'nebari-dev:main' into eks-cluster-encryption
joneszc Oct 30, 2024
b70adaa
Merge branch 'nebari-dev:main' into eks-cluster-encryption
joneszc Oct 31, 2024
8569dbe
replace dict with dataclass schema in aws kms_key_arns function
joneszc Oct 31, 2024
422f59d
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] Oct 31, 2024
6f69cf9
Merge branch 'nebari-dev:main' into eks-cluster-encryption
joneszc Oct 31, 2024
243e764
fix aws kms_key_arns validation
joneszc Oct 31, 2024
095c6e0
Merge branch 'eks-cluster-encryption' of https://github.com/joneszc/n…
joneszc Oct 31, 2024
f297a99
Merge branch 'nebari-dev:main' into eks-cluster-encryption
joneszc Nov 4, 2024
5f39265
Merge branch 'nebari-dev:main' into eks-cluster-encryption
joneszc Nov 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions src/_nebari/provider/cloud/amazon_web_services.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import os
import re
import time
from dataclasses import dataclass
from typing import Dict, List, Optional

import boto3
Expand Down Expand Up @@ -121,6 +122,35 @@ def instances(region: str) -> Dict[str, str]:
return {t: t for t in instance_types}


@dataclass
class Kms_Key_Info:
Arn: str
KeyUsage: str
KeySpec: str
KeyManager: str


@functools.lru_cache()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is stylistic, so it is up to you whether to adopt this. I would probably add a Kms_Key_Info class or something.

@dataclass
class Kms_Key_Info:
    Arn: str
    KeyUsage: str
    KeySpec: str
    KeyManager: str

The advantage here is more clarity in your type annotations. Rather than returning Dict[str, dict] you can return Dict[str, Kms_Key_Info] which is clearer what is actually returned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dcmcand
Updated, per your suggestion

def kms_key_arns(region: str) -> Dict[str, Kms_Key_Info]:
"""Return dict of available/enabled KMS key IDs and associated KeyMetadata for the AWS region."""
session = aws_session(region=region)
client = session.client("kms")
kms_keys = {}
# https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/kms/client/list_keys.html
for key in client.list_keys().get("Keys"):
key_id = key["KeyId"]
# https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/kms/client/describe_key.html#:~:text=Response%20Structure
key_data = client.describe_key(KeyId=key_id).get("KeyMetadata")
if key_data.get("Enabled"):
kms_keys[key_id] = Kms_Key_Info(
Arn=key_data.get("Arn"),
KeyUsage=key_data.get("KeyUsage"),
KeySpec=key_data.get("KeySpec"),
KeyManager=key_data.get("KeyManager"),
)
return kms_keys


def aws_get_vpc_id(name: str, namespace: str, region: str) -> Optional[str]:
"""Return VPC ID for the EKS cluster namedd `{name}-{namespace}`."""
cluster_name = f"{name}-{namespace}"
Expand Down
39 changes: 39 additions & 0 deletions src/_nebari/stages/infrastructure/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,7 @@ class AWSInputVars(schema.Base):
eks_endpoint_access: Optional[
Literal["private", "public", "public_and_private"]
] = "public"
eks_kms_arn: Optional[str] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this field should be immutable to prevent clusters from being deleted.

See src/_nebari/stages/kubernetes_services/init.py:66 for an example of an immutable field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So @joneszc am I reading that correctly that enabling this option will destroy and replace your cluster? We should probably go ahead and make this field immutable then. We definitely don't want anyone accidentally destroying their deploy. The docs should reflect that this should only be used on fresh deploys too.

Hello @dcmcand
Enabling the eks_kms_arn option, including doing so as a re-deploy on an existing cluster, is not a problem. The issue arises when one attempts to disable the cluster encryption on an existing cluster, which causes a failed attempt to destroy the cluster. I've added a warning for this behavior in the docs update. Making the field immutable will block users from being able to set cluster encryption on an existing cluster, which is functional. Is there a mechanism for allowing the field to be set, if not set already, but immutable if altered from a set state? The risk of destroying the cluster is ineluctably countered by the error:
Error: deleting EKS Cluster (nebari-test-dev): operation error EKS: DeleteCluster, https response error StatusCode: 409, RequestID: de8f18ba-0abe-42ae-961f-86d8865fbcf3, ResourceInUseException: Cluster has nodegroups attached

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so:
Fresh install with this enabled = no problem
Fresh install with this disabled = no problem
Enabling on existing install where it wasn't previously enabled = no problem
Disabling on existing install where it was previously enabled = tries to destroy cluster, but fails

What if someone tries to disable encryption, it fails, and then the reenable it in the config and try to deploy again? Does that succeed without error? If so then I think we are good to go.

It would be nice to have one way gates as an option on immutable fields for situations like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joneszc can you test the above scenario and update with the latest main?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dcmcand
Yes, that is the correct logic flow. I'm testing now. I need to test how the components behave as well after enabling encryption and then actually encrypting the secrets, as the docs mention this important step, which might have additional repercussions on Nebari components besides the Cluster functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dcmcand
Following a failed attempt to disable encryption, by setting a previously configured amazon_web_services.eks_kms_arn value to null or removing the field altogether, a follow-up re-deploy to reenable the original encryption setting does succeed without any issues. Also, an attempt to switch KMS keys by replacing a previously set KMS key arn with a new arn will succeed in a re-deploy, but the KMS key is not actually changed on the EKS Cluster. Essentially, no change is made and the cluster functions as before. I'll make a note of this in the docs PR.

Copy link
Contributor

@viniciusdc viniciusdc Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good to hear, @joneszc. If I understood correctly if by any chance the user removes the encryption key from the config and redeploys, the user would first face an error, but if -- what I assume in a few minutes -- the user re-attempts the deployment, it will succeed?

If so, please add this as an admonition to the docs under a warning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @viniciusdc

That is correct.
I've added the admonitions to the docs PR#537, per your recommendation.

node_groups: List[AWSNodeGroupInputVars]
availability_zones: List[str]
vpc_cidr_block: str
Expand Down Expand Up @@ -490,6 +491,7 @@ class AmazonWebServicesProvider(schema.Base):
eks_endpoint_access: Optional[
Literal["private", "public", "public_and_private"]
] = "public"
eks_kms_arn: Optional[str] = None
existing_subnet_ids: Optional[List[str]] = None
existing_security_group_id: Optional[str] = None
vpc_cidr_block: str = "10.10.0.0/16"
Expand Down Expand Up @@ -546,6 +548,42 @@ def _check_input(cls, data: Any) -> Any:
f"Amazon Web Services instance {node_group.instance} not one of available instance types={available_instances}"
)

# check if kms key is valid
available_kms_keys = amazon_web_services.kms_key_arns(data["region"])
if "eks_kms_arn" in data and data["eks_kms_arn"] is not None:
key_id = [
id for id in available_kms_keys.keys() if id in data["eks_kms_arn"]
]
# Raise error if key_id is not found in available_kms_keys
if (
len(key_id) != 1
or available_kms_keys[key_id[0]].Arn != data["eks_kms_arn"]
):
raise ValueError(
f"Amazon Web Services KMS Key with ARN {data['eks_kms_arn']} not one of available/enabled keys={[v.Arn for v in available_kms_keys.values() if v.KeyManager=='CUSTOMER' and v.KeySpec=='SYMMETRIC_DEFAULT']}"
)
key_id = key_id[0]
# Raise error if key is not a customer managed key
if available_kms_keys[key_id].KeyManager != "CUSTOMER":
raise ValueError(
f"Amazon Web Services KMS Key with ID {key_id} is not a customer managed key"
)
# Symmetric KMS keys with Encrypt and decrypt key-usage have the SYMMETRIC_DEFAULT key-spec
# EKS cluster encryption requires a Symmetric key that is set to encrypt and decrypt data
if available_kms_keys[key_id].KeySpec != "SYMMETRIC_DEFAULT":
if available_kms_keys[key_id].KeyUsage == "GENERATE_VERIFY_MAC":
raise ValueError(
f"Amazon Web Services KMS Key with ID {key_id} does not have KeyUsage set to 'Encrypt and decrypt' data"
)
elif available_kms_keys[key_id].KeyUsage != "ENCRYPT_DECRYPT":
raise ValueError(
f"Amazon Web Services KMS Key with ID {key_id} is not of type Symmetric, and KeyUsage not set to 'Encrypt and decrypt' data"
)
else:
raise ValueError(
f"Amazon Web Services KMS Key with ID {key_id} is not of type Symmetric"
)

return data


Expand Down Expand Up @@ -835,6 +873,7 @@ def input_vars(self, stage_outputs: Dict[str, Dict[str, Any]]):
name=self.config.escaped_project_name,
environment=self.config.namespace,
eks_endpoint_access=self.config.amazon_web_services.eks_endpoint_access,
eks_kms_arn=self.config.amazon_web_services.eks_kms_arn,
existing_subnet_ids=self.config.amazon_web_services.existing_subnet_ids,
existing_security_group_id=self.config.amazon_web_services.existing_security_group_id,
region=self.config.amazon_web_services.region,
Expand Down
1 change: 1 addition & 0 deletions src/_nebari/stages/infrastructure/template/aws/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@ module "kubernetes" {

endpoint_public_access = var.eks_endpoint_access == "private" ? false : true
endpoint_private_access = var.eks_endpoint_access == "public" ? false : true
eks_kms_arn = var.eks_kms_arn
public_access_cidrs = var.eks_public_access_cidrs
permissions_boundary = var.permissions_boundary
}
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,20 @@ resource "aws_eks_cluster" "main" {
public_access_cidrs = var.public_access_cidrs
}

# Only set encryption_config if eks_kms_arn is not null
dynamic "encryption_config" {
for_each = var.eks_kms_arn != null ? [1] : []
content {
provider {
key_arn = var.eks_kms_arn
}
resources = ["secrets"]
}
}

depends_on = [
aws_iam_role_policy_attachment.cluster-policy,
aws_iam_role_policy_attachment.cluster_encryption,
]

tags = merge({ Name = var.name }, var.tags)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,33 @@ resource "aws_iam_role_policy_attachment" "cluster-policy" {
role = aws_iam_role.cluster.name
}

data "aws_iam_policy_document" "cluster_encryption" {
count = var.eks_kms_arn != null ? 1 : 0
statement {
actions = [
"kms:Encrypt",
"kms:Decrypt",
"kms:ListGrants",
"kms:DescribeKey"
]
resources = [var.eks_kms_arn]
}
}

resource "aws_iam_policy" "cluster_encryption" {
count = var.eks_kms_arn != null ? 1 : 0
name = "${var.name}-eks-encryption-policy"
description = "IAM policy for EKS cluster encryption"
policy = data.aws_iam_policy_document.cluster_encryption[count.index].json
}

# Grant the EKS Cluster role KMS permissions if a key-arn is specified
resource "aws_iam_role_policy_attachment" "cluster_encryption" {
count = var.eks_kms_arn != null ? 1 : 0
policy_arn = aws_iam_policy.cluster_encryption[count.index].arn
role = aws_iam_role.cluster.name
}

# =======================================================
# Kubernetes Node Group Policies
# =======================================================
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,12 @@ variable "endpoint_private_access" {
default = false
}

variable "eks_kms_arn" {
description = "kms key arn for EKS cluster encryption_config"
type = string
default = null
}

variable "public_access_cidrs" {
type = list(string)
default = ["0.0.0.0/0"]
Expand Down
6 changes: 6 additions & 0 deletions src/_nebari/stages/infrastructure/template/aws/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,12 @@ variable "eks_endpoint_private_access" {
default = false
}

variable "eks_kms_arn" {
description = "kms key arn for EKS cluster encryption_config"
type = string
default = null
}

variable "eks_public_access_cidrs" {
type = list(string)
default = ["0.0.0.0/0"]
Expand Down
12 changes: 12 additions & 0 deletions tests/tests_unit/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,18 @@ def _mock_return_value(return_value):
"m5.xlarge": "m5.xlarge",
"m5.2xlarge": "m5.2xlarge",
},
"_nebari.provider.cloud.amazon_web_services.kms_key_arns": {
"xxxxxxxx-east-zzzz": {
"Arn": "arn:aws:kms:us-east-1:100000:key/xxxxxxxx-east-zzzz",
"KeyUsage": "ENCRYPT_DECRYPT",
"KeySpec": "SYMMETRIC_DEFAULT",
},
"xxxxxxxx-west-zzzz": {
"Arn": "arn:aws:kms:us-west-2:100000:key/xxxxxxxx-west-zzzz",
"KeyUsage": "ENCRYPT_DECRYPT",
"KeySpec": "SYMMETRIC_DEFAULT",
},
},
# Azure
"_nebari.provider.cloud.azure_cloud.kubernetes_versions": [
"1.18",
Expand Down
Loading