Introduce CI for AWS - part 1 #2274

wainersm · 2025-02-03T22:07:31Z

This is just part 1 of a series of commits to run the e2e tests nightly for AWS too.

The way it's now, tests are executed but some fails. In particular, the TestAwsCreatePeerPodWithLargeImage fails in such as bad fashion that the job gets cancelled. The good news is that the simple pod tests at least pass. Here is an execution on my fork: https://github.com/wainersm/cc-cloud-api-adaptor/actions/runs/13122445400

Below is a list of things I still have to work on to make it acceptable running alongside our CI (at this point, the job will skip because I won't configure the AWS credentials on this repo yet). Nevertheless, I'd like to have this part merged because there are other aws-unrelated changes I plan to submit to the workflows and I want to avoid keep rebasing & resolving conflicts in my fork.

What's next:

backup code to delete the created resources on AWS because there are some occasions where the deletion code of the e2e framework doesn't run, for example, the failure of the TestAwsCreatePeerPodWithLargeImage I mentioned below causes that problem.
deal with the failing tests. Either disable or fix them.
run with CRI-O
make it more resilient. For example, sometimes the VPC is created on an Availability Zone where the default podvm instance type isn't available, so it fails all tests
[updated] add a debug step
[updated] make it work with mkosi podvm images
[updated ] move common code to scripts

stevenhorsman

Not necessarily for this PR, but I'm wondering if we can separate some of the steps into scripts given that lots of this is duplicated with the other providers. I know we've discussed it before, but I can't recall if we deliberately rejected it, or not

stevenhorsman · 2025-02-04T15:18:45Z

.github/workflows/e2e_run_all.yaml

+    name: aws
+    if: |
+      github.event_name == 'workflow_dispatch'
+    needs: [podvm, image, prep_install]


Does AWS only work with the packer build, or also mkosi?

oh, I forgot to say on the description...I couldn't make it work with mkosi and it's in my list to debug. The workflow supports both packer and mkosi image though.

AWS works perfectly fine with mkosi. The default images made available (and part of instructions in confidentialcontainers.org) starting 0.11.0 are mkosi based.
Once you have created the mkosi raw image, use the raw-to-ami.sh script to upload and create AMI or you can use uplosi.

hmmm I chained these changes with the existing workflow that builds the mkosi-based image, but all AWS e2e tests failed. Maybe the problem was/is somewhere else. I will revisit that topic soon.

stevenhorsman · 2025-02-04T15:22:44Z

.github/workflows/e2e_aws.yaml

+          export TEST_PODVM_IMAGE="${{ env.PODVM_QCOW2 }}"
+          export TEST_E2E_TIMEOUT="90m"
+
+          make test-e2e


Can we get any debug logs in case things go wrong?

Good idea, I overlooked it completely. In a next PR hopefully

wainersm · 2025-02-04T17:27:26Z

Hi @stevenhorsman !

Not necessarily for this PR, but I'm wondering if we can separate some of the steps into scripts given that lots of this is duplicated with the other providers. I know we've discussed it before, but I can't recall if we deliberately rejected it, or not

IMO, we can and must separate into scripts. Share common code and avoid the nasty pull_request_target limitation. I can give it a try on a following PR.

Created a callable workflow for running the AWS e2e tests. This initial implementation has support for testing mkosi or packer based images, being default the later. The cluster_type has only support to "onprem" cluster, and the workflow will create a kcli-based kubeadm one. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>

The new created e2e_aws is called by the e2e_run_all, so AWS e2e tests will run on nightly. At this point it won't be triggered by pull request. It's testing the packer based podvm images. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>

Tagging with "Name" all the AWS resources created to help on tracking and removal of them all, mainly when running on CI. In order to tag images I had to bump github.com/aws/aws-sdk-go-v2/service/ec2 which cascated to updating other modules. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>

So to generate unique names to avoid clashing published images. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>

So that on VPC teardown (if enabled) the created AMI will be deleted along with its corresponding EBS snapshot. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>

Delete the key from bucket that contains the raw disk image. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>

wainersm · 2025-02-04T19:21:27Z

Updated with fixes to workflow lint errors.

ldoktor

Hello Wainer, overall it looks quite good, I'd suggest tagging the resources using the workflow-id to allow an extra cleanup job that would iterate over resources and deleted the ones where workflow finished. To further simplify that, if possible, I'd like to add a ci=true tag to every resource that is generated. I guess currently that is not supported by peer-pods, right? I think it'd be a useful feature even there (for peer-pods VMs so people can add custom tags).

One thing I noticed is that the job seems to attempt to remove even VPCs it had not created. Would be nice to double-check and only cleanup when we created them... (not 100% sure about that)

Looking forward for the part 2 ;-)

wainersm · 2025-02-11T18:12:56Z

Hi @ldoktor !

Hello Wainer, overall it looks quite good, I'd suggest tagging the resources using the workflow-id to allow an extra cleanup job that would iterate over resources and deleted the ones where workflow finished. To further simplify that, if possible, I'd like to add a ci=true tag to every resource that is generated. I guess currently that is not supported by peer-pods, right? I think it'd be a useful feature even there (for peer-pods VMs so people can add custom tags).

In part 2 that I should send soon I added an extra step that runs "always" to do such as cleanup in case the e2e finish abruptly. I just need to ensure it deletes only the resources created a that given workflow execution.

I never tried myself, but it should be possible to configure peer-pods to tag the podvms: https://github.com/confidential-containers/cloud-api-adaptor/blob/main/src/cloud-api-adaptor/install/overlays/aws/kustomization.yaml#L35

One thing I noticed is that the job seems to attempt to remove even VPCs it had not created. Would be nice to double-check and only cleanup when we created them... (not 100% sure about that)

It should delete only the VPCs created (and related resources) created by the e2e test run.

Looking forward for the part 2 ;-)

\o/

wainersm added CI Issues related to CI workflows provider/aws Issues related to AWS CAA provider labels Feb 3, 2025

wainersm requested a review from a team as a code owner February 3, 2025 22:07

stevenhorsman approved these changes Feb 4, 2025

View reviewed changes

wainersm added 6 commits February 4, 2025 14:47

test/provision: add timestamp to created AWS AMIs

b564731

So to generate unique names to avoid clashing published images. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>

test: add function to deregister AWS AMI Image

57f0d77

So that on VPC teardown (if enabled) the created AMI will be deleted along with its corresponding EBS snapshot. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>

test: add function to delete AWS bucket key

e88f53f

Delete the key from bucket that contains the raw disk image. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>

wainersm force-pushed the ci_aws-1 branch from 2d1413a to e88f53f Compare February 4, 2025 19:20

wainersm requested a review from ldoktor February 11, 2025 13:32

ldoktor reviewed Feb 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce CI for AWS - part 1 #2274

Introduce CI for AWS - part 1 #2274

wainersm commented Feb 3, 2025 •

edited

Loading

stevenhorsman left a comment

stevenhorsman Feb 4, 2025

wainersm Feb 4, 2025

bpradipt Feb 11, 2025

wainersm Feb 11, 2025

stevenhorsman Feb 4, 2025

wainersm Feb 4, 2025

wainersm commented Feb 4, 2025

wainersm commented Feb 4, 2025

ldoktor left a comment •

edited

Loading

wainersm commented Feb 11, 2025

Introduce CI for AWS - part 1 #2274

Are you sure you want to change the base?

Introduce CI for AWS - part 1 #2274

Conversation

wainersm commented Feb 3, 2025 • edited Loading

stevenhorsman left a comment

Choose a reason for hiding this comment

stevenhorsman Feb 4, 2025

Choose a reason for hiding this comment

wainersm Feb 4, 2025

Choose a reason for hiding this comment

bpradipt Feb 11, 2025

Choose a reason for hiding this comment

wainersm Feb 11, 2025

Choose a reason for hiding this comment

stevenhorsman Feb 4, 2025

Choose a reason for hiding this comment

wainersm Feb 4, 2025

Choose a reason for hiding this comment

wainersm commented Feb 4, 2025

wainersm commented Feb 4, 2025

ldoktor left a comment • edited Loading

Choose a reason for hiding this comment

wainersm commented Feb 11, 2025

wainersm commented Feb 3, 2025 •

edited

Loading

ldoktor left a comment •

edited

Loading