Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gruntwork test ACM certificate getting removed in us-west-1 and eu-west-2 #579

Closed
arsci opened this issue Sep 5, 2023 · 25 comments · Fixed by #580, #592, #596, #607 or #609
Closed

Gruntwork test ACM certificate getting removed in us-west-1 and eu-west-2 #579

arsci opened this issue Sep 5, 2023 · 25 comments · Fixed by #580, #592, #596, #607 or #609

Comments

@arsci
Copy link
Contributor

arsci commented Sep 5, 2023

The wildcard ACM certificate used for testing in phxdevops is getting removed by the scheduled cloud-nuke job, but only in us-west-1 and eu-west-2 regions. We need this key to be present in these two regions for tests.

I haven't had a chance to dig in to see if we are using a config exclusion, and it isn't working, or if there is some other reason for the removal. It appears to be only these two regions, the key does not get deleted in other regions.

@gcagle3
Copy link

gcagle3 commented Sep 5, 2023

Temporarily disabled tests for these regions in terraform-aws-load-balancer (via this commit) to minimize test failures. Leaving a note here as a reminder to re-enable these regions once this issue is resolved.

@james03160927
Copy link
Contributor

Looked at the existing configuration and it doesn't seem to have any filtering for ACM resource type - https://github.com/gruntwork-io/cloud-nuke/blob/master/.circleci/nuke_config.yml.

@james03160927
Copy link
Contributor

Also, we don't pass in ACM as the exclude resource type - https://github.com/gruntwork-io/cloud-nuke/blob/master/.circleci/config.yml#L45.

@james03160927
Copy link
Contributor

It seems like the ACM for gruntwork testing seems to have domain name *.gruntwork.in so we can create a filter for that

image

@james03160927
Copy link
Contributor

It seems like we have an additional filter to exclude resources that are currently being used. However, I guess the certificates in those regions were not being used for some region?

@james03160927
Copy link
Contributor

Hey @arsci, I created certificates in both us-west-1 and eu-west-2 regions. I'm not sure if creating certificates solves the test failures. I'll leave this ticket open until we verify whether tests pass again.

@james03160927 james03160927 reopened this Sep 7, 2023
@arsci
Copy link
Contributor Author

arsci commented Sep 7, 2023

Thanks @hongil0316! Yes, just creating those certs should solve the tests errors.

@gcagle3 you can re-enable tests for your branch in those two regions

@gcagle3
Copy link

gcagle3 commented Sep 14, 2023

@gcagle3 you can re-enable tests for your branch in those two regions

These have been re-enabled. Thanks!

@james03160927
Copy link
Contributor

Nice. I'll close this issue then!

@gcagle3
Copy link

gcagle3 commented Sep 15, 2023

It seems like the ACM for gruntwork testing seems to have domain name *.gruntwork.in so we can create a filter for that

image

@hongil0316 would it be possible to add a filter for the domain name 'gruntwork.in' as well? It looks like we'll need ACM certs for both '*.gruntwork.in' and 'gruntwork.in' for all of the load balancer tests to complete.

Example:
image

We'll need this for the following regions (if the filter is region specific, if not no worries):

  • us-east-1
  • us-east-2
  • us-west-1
  • us-west-2
  • eu-west-1
  • eu-west-2

@gcagle3 gcagle3 reopened this Sep 15, 2023
@james03160927
Copy link
Contributor

I believe this change should cover both scenarios:

https://github.com/gruntwork-io/cloud-nuke/pull/580/files

@gcagle3
Copy link

gcagle3 commented Sep 15, 2023

Ah, that should work, splendid. Thank you, I'll re-close this one!

@gcagle3 gcagle3 closed this as completed Sep 15, 2023
@gcagle3
Copy link

gcagle3 commented Sep 22, 2023

@hongil0316 hm, some bad news. It looks like the certificates for just 'gruntwork.in' (not *.gruntwork.in) are still being deleted every night.

As a thought, should the 'include' in this code actually be exclude? If I'm reading this right, it looks like 'include' is used to identify resources to be nuked and 'exclude' is being used to protect resources from deletion. In this case, we want to protect both *.gruntwork.in and gruntwork.in certs.

ACM:
  include:
    names_regex:
      - "^gruntwork.in"

@gcagle3
Copy link

gcagle3 commented Oct 4, 2023

@hongil0316 would there be anything else scheduled to clear these certs out? Monitoring things from yesterday, it looks like all of the 'gruntwork.in' certs were wiped out again last night.

Noting the following changes in the Phoenix account:

  • In us-east-1, us-east-2, us-west-2, and eu-west-1, the 'gruntwork.in' cert was deleted
  • In us-west-1 and eu-west-2, both the 'gruntwork.in' and '*.gruntwork.in' certs were deleted

@gcagle3 gcagle3 reopened this Oct 4, 2023
@james03160927
Copy link
Contributor

Hey @gcagle3 @arsci, I just tested the cloud-nuke behaviour with the circleCi nuke_config.yml and it seems to work as expected. This is the command line I used:

aws-vault exec phxdevops -- go run main.go aws --resource-type acm --log-level debug --region us-east-1 --config .circleci/nuke_config.yml

And here is the debugging lines:
image

image

@james03160927
Copy link
Contributor

Looking at the CloudTrail activies, it seems to be deleted by the circle-ci-test user name

image

@james03160927
Copy link
Contributor

From this execution, I verified that it seems to be indeed deleted by circle ci - https://app.circleci.com/pipelines/github/gruntwork-io/cloud-nuke/12355/workflows/ac53f08b-cbbe-41b8-8f58-3eaadbb781d9/jobs/34125.

image

@james03160927
Copy link
Contributor

Hey @arsci @gcagle3, can you tell me how you are creating the ACM? Maybe the way I create the ACM is not the same as those certificates being created for unit tests?

This is how I created:

image

@gcagle3
Copy link

gcagle3 commented Oct 5, 2023

Hey @arsci @gcagle3, can you tell me how you are creating the ACM? Maybe the way I create the ACM is not the same as those certificates being created for unit tests?

This is how I created:

image

This is exactly the same process I've been using for both gruntwork.in and *.gruntwork.in.

@james03160927
Copy link
Contributor

Hmm the debug message doesn't seem to help too much... I wonder if the config file values are being reflected properly.
The odd thing is that we are having different behaviour when running in circleCi vs. locally...

@gcagle3
Copy link

gcagle3 commented Oct 11, 2023

The odd thing is that we are having different behaviour when running in circleCi vs. locally...

Would CircleCi be setting specific environment variables that would change the behavior?

@james03160927
Copy link
Contributor

Looking at the debug logs, it seems like the config is not properly parsed for whatever reason:

  DEBUG   shouldInclude result for ACM: arn:aws:acm:us-west-1:087285199408:certificate/1fc80fe9-5bcf-43d9-86c1-e5a6d3d6ffd9 w/ domain name: gruntwork.in, time: 2023-10-11 19:46:05.48 +0000 UTC, and config: {IncludeRule:{NamesRegExp:[] TimeAfter:<nil> TimeBefore:<nil> Tag:<nil>} ExcludeRule:{NamesRegExp:[] TimeAfter:2023-10-11 19:52:45.931855658 +0000 UTC m=-7198.437556645 TimeBefore:<nil> Tag:<nil>}}

You will notice that NamesRegExp for both IncludeRule and ExcludeRule are empty.

@gcagle3
Copy link

gcagle3 commented Oct 13, 2023

After internal discussion, we're wondering if the default tests (which run with no config) might be causing this. Looking at the test, it will check if the cert is in use and try to delete if it is not in use.

Comparing the above against the following we've observed:

  • In us-east-1, us-east-2, us-west-2, and eu-west-1, the 'gruntwork.in' cert was deleted
  • In us-west-1 and eu-west-2, both the 'gruntwork.in' and '*.gruntwork.in' certs were deleted

It is important to note that the '*.gruntwork.in' cert is 'in use' in the us-east-1, us-east-2, us-west-2 and eu-west-1 regions where it is not being deleted. It is not in use in us-west-1 / eu-west-2 where it is being deleted, which would make sense if the tests are actually deleting the certs. All certs being deleted are not in use.

@hongil0316 do you think it might be worth adjusting the test case to address this?

@james03160927
Copy link
Contributor

Had further discussion internally here: https://gruntwork-io.slack.com/archives/C6V3DJAHJ/p1697159092430149
We initially mitigated the issue by excluding ACM resource type - https://github.com/gruntwork-io/cloud-nuke/pulls?q=is%3Apr+is%3Aclosed.

After further troubleshooting, I realized that the bug existed when we had both config file + time filter present at the same time. Here is a fix for the bug - #607

@gcagle3
Copy link

gcagle3 commented Oct 23, 2023

As a quick follow-up, I can confirm that both the *.gruntwork.in and gruntwork.in certificate have not been deleted (and are still present) since this PR was merged. Great work on this @hongil0316!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment