-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add config option to enable the encryption of AWS EKS secrets #2788
Conversation
…y ARN to encrypt EKS cluster secrets
@viniciusdc Also in follow-up to your ask, it appears that re-deploying to set KMS encryption on an existing Nebari EKS Cluster, without previous encryption set, does succeed. However, attempting thereafter to re-deploy to remove the previously set EKS secrets encryption will fail as terraform attempts to delete and rebuild the EKS cluster but cannot due to existing node groups. |
session = aws_session(region=region) | ||
client = session.client("kms") | ||
# https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/kms/client/list_keys.html | ||
paginator = client.get_paginator("list_keys") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need a paginator here? It doesn't seem like we are using it. It seems like lines 137-141 could be replaced by
kms_keys = [i["KeyId"] for i in client.list_keys().get("Keys")]
Which seems easier to read to me since it avoids nested loops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or if you want it more explicit
key_id_list = client.list_keys().get("Keys")
kms_keys = [i.get("KeyId") for i in key_id_list]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
for i in paginator.paginate() | ||
for j in i["Keys"] | ||
] | ||
return {i["KeyId"]: {k: i[k] for k in fields} for i in kms_keys if i["Enabled"]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel like this line is super readable. I would suggest extracting it into a separate function.
class Kms_key:
Arn: str
KeyUsage: str
KeySpec: str
def check_kms_keys(key_ids: list[str], client: boto3.Client) -> dict[str, Kms_key]:
keys = []
for id in key_ids:
key = client.describe_key(KeyId=id).get("KeyMetadata")
if key.get("Enabled"):
keys.append(kms_key(
Arn=key.get("Arn"),
KeyUsage=key.get("KeyUsage"),
KeySpec=key.get("KeySpec"),
)
)
return keys
Would be way easier to follow and accomplish the same thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do prefer this as well!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made the function more readable but I did not separate into two functions.
Note: the filter on the dictionary serves no other purpose other than to avoid passing more data than necessary for validating keys for EKS encryption. It's possible this function could be utilized in the future for collecting metadata on kms keys employed for services other than EKS encryption.
This function could just as easily be written:
def kms_key_arns(region: str) -> Dict[str, dict]:
"""Return dict of available/enabled KMS key IDs and associated KeyMetadata for the AWS region."""
session = aws_session(region=region)
client = session.client("kms")
kms_keys = {}
for key in client.list_keys().get("Keys"):
key_id = key["KeyId"]
key_data = client.describe_key(KeyId=key_id).get("KeyMetadata")
if key_data.get("Enabled"):
kms_keys[key_id] = key_data
return kms_keys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this looks much cleaner and more readable. Nice work there.
# check if kms key is valid | ||
available_kms_keys = amazon_web_services.kms_key_arns(data["region"]) | ||
if "eks_kms_arn" in data and data["eks_kms_arn"] is not None: | ||
key_id = [ | ||
id for id in available_kms_keys.keys() if id in data["eks_kms_arn"] | ||
] | ||
if ( | ||
len(key_id) == 1 | ||
and available_kms_keys[key_id[0]]["Arn"] == data["eks_kms_arn"] | ||
): | ||
key_id = key_id[0] | ||
# Symmetric KMS keys with Encrypt and decrypt key-usage have the SYMMETRIC_DEFAULT key-spec | ||
# EKS cluster encryption requires a Symmetric key that is set to encrypt and decrypt data | ||
if available_kms_keys[key_id]["KeySpec"] != "SYMMETRIC_DEFAULT": | ||
if available_kms_keys[key_id]["KeyUsage"] == "GENERATE_VERIFY_MAC": | ||
raise ValueError( | ||
f"Amazon Web Services KMS Key with ID {key_id} does not have KeyUsage set to 'Encrypt and decrypt' data" | ||
) | ||
elif available_kms_keys[key_id]["KeyUsage"] != "ENCRYPT_DECRYPT": | ||
raise ValueError( | ||
f"Amazon Web Services KMS Key with ID {key_id} is not of type Symmetric, and KeyUsage not set to 'Encrypt and decrypt' data" | ||
) | ||
else: | ||
raise ValueError( | ||
f"Amazon Web Services KMS Key with ID {key_id} is not of type Symmetric" | ||
) | ||
else: | ||
raise ValueError( | ||
f"Amazon Web Services KMS Key with ARN {data['eks_kms_arn']} not one of available/enabled keys={[v['Arn'] for v in available_kms_keys.values()]}" | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could avoid a lot of nesting here by flipping the logic on these if statements. Something like:
available_kms_keys = amazon_web_services.kms_key_arns(data["region"])
# don't check if eks_kms_arn is not set
if "eks_kms_arn" not in data or data["eks_kms_arn"] is None:
return data
key_id = [
id for id in available_kms_keys.keys() if id in data["eks_kms_arn"]
]
# Raise error if key_id is not found in available_kms_keys
if (
len(key_id) != 1
or available_kms_keys[key_id[0]]["Arn"] != data["eks_kms_arn"]
):
raise ValueError(
f"Amazon Web Services KMS Key with ARN {data['eks_kms_arn']} not one of available/enabled keys={[v['Arn'] for v in available_kms_keys.values()]}"
)
key_id = key_id[0]
# EKS cluster encryption requires a Symmetric key that is set to encrypt and decrypt data
if available_kms_keys[key_id]["KeySpec"] != "SYMMETRIC_DEFAULT":
raise ValueError(
f"Amazon Web Services KMS Key with ID {key_id} is not of type Symmetric"
)
if available_kms_keys[key_id]["KeyUsage"] != "ENCRYPT_DECRYPT":
raise ValueError(
f"Amazon Web Services KMS Key with ID {key_id} KeyUsage not set to 'Encrypt and decrypt' data"
) available_kms_keys = amazon_web_services.kms_key_arns(data["region"])
# don't check if eks_kms_arn is not set
if "eks_kms_arn" not in data or data["eks_kms_arn"] is None:
return data
key_id = [
id for id in available_kms_keys.keys() if id in data["eks_kms_arn"]
]
# Raise error if key_id is not found in available_kms_keys
if (
len(key_id) != 1
or available_kms_keys[key_id[0]]["Arn"] != data["eks_kms_arn"]
):
raise ValueError(
f"Amazon Web Services KMS Key with ARN {data['eks_kms_arn']} not one of available/enabled keys={[v['Arn'] for v in available_kms_keys.values()]}"
)
key_id = key_id[0]
# EKS cluster encryption requires a Symmetric key that is set to encrypt and decrypt data
if available_kms_keys[key_id]["KeySpec"] != "SYMMETRIC_DEFAULT":
raise ValueError(
f"Amazon Web Services KMS Key with ID {key_id} is not of type Symmetric"
)
if available_kms_keys[key_id]["KeyUsage"] != "ENCRYPT_DECRYPT":
raise ValueError(
f"Amazon Web Services KMS Key with ID {key_id} KeyUsage not set to 'Encrypt and decrypt' data"
)
Has less deeply nested logic and is therefore easier to read and test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dcmcand
my hesitation to invoke return data
in numerous places in this section is that could cause issues in the future if additional validators are appended after the KMS-specific validator lines under _check_input
and validators are potentially skipped prematurely for aws services.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joneszc I can see that. It might make sense to pull every validator into its own method then, and have _check_input
just call a list of validators. That way each could exit early if they can and nothing would get skipped. I think that would also be cleaner for readability and avoid too much responsibility for a single method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dcmcand
That would make sense. I could open a separate PR for breaking out multiple methods for _check_input, if you'd like. For now, I've updated the logic, based on your recommendation, for the first check. I've also added a check to ensure the KMS key is a customer-managed versus aws-managed key. I've left the final check in place with conditionals to precisely triage the error; for example, there is a case for a SYMMETRIC_DEFAULT
key that could seem adequate but will fail if it is "GENERATE_VERIFY_MAC" rather than "ENCRYPT_DECRYPT".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you open an issue for breaking out the _check_input methods so that doesn't get lost?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @joneszc, once you open the issue, can you reference this one as well, to have a cross reference for contextualization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viniciusdc @dcmcand
yes, here is the issue
Hi @joneszc, thanks for checking that out! I was already expecting it to fail, but I had another thing in mind: they might be connected. Can you post a sanitized output of the terraform error and any error messages you might encounter in the CloudTrail history? I suspect you will find something related to the KMS key in there. the main reason for this request is to validate if it will be beneficial to have this as an immutable field or, depending on the error, we can add manual steps to the user in our docs to disable it. |
Nebari output after failed attempt to re-deploy to remove eks cluster's envelope encryption of secrets:
|
So @joneszc am I reading that correctly that enabling this option will destroy and replace your cluster? We should probably go ahead and make this field immutable then. We definitely don't want anyone accidentally destroying their deploy. The docs should reflect that this should only be used on fresh deploys too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @joneszc , This is looking really good. I think that the config option needs to be immutable though.
@@ -121,6 +121,27 @@ def instances(region: str) -> Dict[str, str]: | |||
return {t: t for t in instance_types} | |||
|
|||
|
|||
@functools.lru_cache() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is stylistic, so it is up to you whether to adopt this. I would probably add a Kms_Key_Info
class or something.
@dataclass
class Kms_Key_Info:
Arn: str
KeyUsage: str
KeySpec: str
KeyManager: str
The advantage here is more clarity in your type annotations. Rather than returning Dict[str, dict]
you can return Dict[str, Kms_Key_Info]
which is clearer what is actually returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dcmcand
Updated, per your suggestion
@@ -174,6 +174,7 @@ class AWSInputVars(schema.Base): | |||
eks_endpoint_access: Optional[ | |||
Literal["private", "public", "public_and_private"] | |||
] = "public" | |||
eks_kms_arn: Optional[str] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this field should be immutable to prevent clusters from being deleted.
See src/_nebari/stages/kubernetes_services/init.py:66 for an example of an immutable field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So @joneszc am I reading that correctly that enabling this option will destroy and replace your cluster? We should probably go ahead and make this field immutable then. We definitely don't want anyone accidentally destroying their deploy. The docs should reflect that this should only be used on fresh deploys too.
Hello @dcmcand
Enabling the eks_kms_arn
option, including doing so as a re-deploy on an existing cluster, is not a problem. The issue arises when one attempts to disable the cluster encryption on an existing cluster, which causes a failed attempt to destroy the cluster. I've added a warning for this behavior in the docs update. Making the field immutable will block users from being able to set cluster encryption on an existing cluster, which is functional. Is there a mechanism for allowing the field to be set, if not set already, but immutable if altered from a set state? The risk of destroying the cluster is ineluctably countered by the error:
Error: deleting EKS Cluster (nebari-test-dev): operation error EKS: DeleteCluster, https response error StatusCode: 409, RequestID: de8f18ba-0abe-42ae-961f-86d8865fbcf3, ResourceInUseException: Cluster has nodegroups attached
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so:
Fresh install with this enabled = no problem
Fresh install with this disabled = no problem
Enabling on existing install where it wasn't previously enabled = no problem
Disabling on existing install where it was previously enabled = tries to destroy cluster, but fails
What if someone tries to disable encryption, it fails, and then the reenable it in the config and try to deploy again? Does that succeed without error? If so then I think we are good to go.
It would be nice to have one way gates as an option on immutable fields for situations like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joneszc can you test the above scenario and update with the latest main?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dcmcand
Yes, that is the correct logic flow. I'm testing now. I need to test how the components behave as well after enabling encryption and then actually encrypting the secrets, as the docs mention this important step, which might have additional repercussions on Nebari components besides the Cluster functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dcmcand
Following a failed attempt to disable encryption, by setting a previously configured amazon_web_services.eks_kms_arn
value to null
or removing the field altogether, a follow-up re-deploy to reenable the original encryption setting does succeed without any issues. Also, an attempt to switch KMS keys by replacing a previously set KMS key arn with a new arn will succeed in a re-deploy, but the KMS key is not actually changed on the EKS Cluster. Essentially, no change is made and the cluster functions as before. I'll make a note of this in the docs PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's good to hear, @joneszc. If I understood correctly if by any chance the user removes the encryption key from the config and redeploys, the user would first face an error, but if -- what I assume in a few minutes -- the user re-attempts the deployment, it will succeed?
If so, please add this as an admonition to the docs under a warning
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @viniciusdc
That is correct.
I've added the admonitions to the docs PR#537, per your recommendation.
…ebari into eks-cluster-encryption
Reference Issues or PRs
Fixes #2681
Fixes #2746
Modifies PR#2723 (Failing Tests / Pytest)
Modifies PR#2752 (Failing Tests / Pytest)
What does this implement/fix?
Put a
x
in the boxes that applyTesting
How to test this PR?
Any other comments?
Allows user to set EKS encryption of secrets by specifying a KMS key ARN in nebari-config.yaml
The KMS key must meet the following conditions: