Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce CI for AWS - part 1 #2274

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

wainersm
Copy link
Member

@wainersm wainersm commented Feb 3, 2025

This is just part 1 of a series of commits to run the e2e tests nightly for AWS too.

The way it's now, tests are executed but some fails. In particular, the TestAwsCreatePeerPodWithLargeImage fails in such as bad fashion that the job gets cancelled. The good news is that the simple pod tests at least pass. Here is an execution on my fork: https://github.com/wainersm/cc-cloud-api-adaptor/actions/runs/13122445400

Below is a list of things I still have to work on to make it acceptable running alongside our CI (at this point, the job will skip because I won't configure the AWS credentials on this repo yet). Nevertheless, I'd like to have this part merged because there are other aws-unrelated changes I plan to submit to the workflows and I want to avoid keep rebasing & resolving conflicts in my fork.

What's next:

  • backup code to delete the created resources on AWS because there are some occasions where the deletion code of the e2e framework doesn't run, for example, the failure of the TestAwsCreatePeerPodWithLargeImage I mentioned below causes that problem.
  • deal with the failing tests. Either disable or fix them.
  • run with CRI-O
  • make it more resilient. For example, sometimes the VPC is created on an Availability Zone where the default podvm instance type isn't available, so it fails all tests
  • [updated] add a debug step
  • [updated] make it work with mkosi podvm images
  • [updated ] move common code to scripts

@wainersm wainersm added CI Issues related to CI workflows provider/aws Issues related to AWS CAA provider labels Feb 3, 2025
@wainersm wainersm requested a review from a team as a code owner February 3, 2025 22:07
Copy link
Member

@stevenhorsman stevenhorsman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily for this PR, but I'm wondering if we can separate some of the steps into scripts given that lots of this is duplicated with the other providers. I know we've discussed it before, but I can't recall if we deliberately rejected it, or not

name: aws
if: |
github.event_name == 'workflow_dispatch'
needs: [podvm, image, prep_install]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does AWS only work with the packer build, or also mkosi?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I forgot to say on the description...I couldn't make it work with mkosi and it's in my list to debug. The workflow supports both packer and mkosi image though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AWS works perfectly fine with mkosi. The default images made available (and part of instructions in confidentialcontainers.org) starting 0.11.0 are mkosi based.
Once you have created the mkosi raw image, use the raw-to-ami.sh script to upload and create AMI or you can use uplosi.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm I chained these changes with the existing workflow that builds the mkosi-based image, but all AWS e2e tests failed. Maybe the problem was/is somewhere else. I will revisit that topic soon.

export TEST_PODVM_IMAGE="${{ env.PODVM_QCOW2 }}"
export TEST_E2E_TIMEOUT="90m"

make test-e2e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get any debug logs in case things go wrong?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I overlooked it completely. In a next PR hopefully

@wainersm
Copy link
Member Author

wainersm commented Feb 4, 2025

Hi @stevenhorsman !

Not necessarily for this PR, but I'm wondering if we can separate some of the steps into scripts given that lots of this is duplicated with the other providers. I know we've discussed it before, but I can't recall if we deliberately rejected it, or not

IMO, we can and must separate into scripts. Share common code and avoid the nasty pull_request_target limitation. I can give it a try on a following PR.

Created a callable workflow for running the AWS e2e tests. This initial
implementation has support for testing mkosi or packer based images, being
default the later.

The cluster_type has only support to "onprem" cluster, and the workflow
will create a kcli-based kubeadm one.

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The new created e2e_aws is called by the e2e_run_all, so AWS e2e tests
will run on nightly. At this point it won't be triggered by pull request.

It's testing the packer based podvm images.

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Tagging with "Name" all the AWS resources created to help on
tracking and removal of them all, mainly when running on CI.

In order to tag images I had to bump github.com/aws/aws-sdk-go-v2/service/ec2
which cascated to updating other modules.

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
So to generate unique names to avoid clashing published images.

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
So that on VPC teardown (if enabled) the created AMI will be deleted along with
its corresponding EBS snapshot.

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Delete the key from bucket that contains the raw disk image.

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
@wainersm
Copy link
Member Author

wainersm commented Feb 4, 2025

Updated with fixes to workflow lint errors.

@wainersm wainersm requested a review from ldoktor February 11, 2025 13:32
Copy link
Contributor

@ldoktor ldoktor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Wainer, overall it looks quite good, I'd suggest tagging the resources using the workflow-id to allow an extra cleanup job that would iterate over resources and deleted the ones where workflow finished. To further simplify that, if possible, I'd like to add a ci=true tag to every resource that is generated. I guess currently that is not supported by peer-pods, right? I think it'd be a useful feature even there (for peer-pods VMs so people can add custom tags).

One thing I noticed is that the job seems to attempt to remove even VPCs it had not created. Would be nice to double-check and only cleanup when we created them... (not 100% sure about that)

Looking forward for the part 2 ;-)

@wainersm
Copy link
Member Author

Hi @ldoktor !

Hello Wainer, overall it looks quite good, I'd suggest tagging the resources using the workflow-id to allow an extra cleanup job that would iterate over resources and deleted the ones where workflow finished. To further simplify that, if possible, I'd like to add a ci=true tag to every resource that is generated. I guess currently that is not supported by peer-pods, right? I think it'd be a useful feature even there (for peer-pods VMs so people can add custom tags).

In part 2 that I should send soon I added an extra step that runs "always" to do such as cleanup in case the e2e finish abruptly. I just need to ensure it deletes only the resources created a that given workflow execution.

I never tried myself, but it should be possible to configure peer-pods to tag the podvms: https://github.com/confidential-containers/cloud-api-adaptor/blob/main/src/cloud-api-adaptor/install/overlays/aws/kustomization.yaml#L35

One thing I noticed is that the job seems to attempt to remove even VPCs it had not created. Would be nice to double-check and only cleanup when we created them... (not 100% sure about that)

It should delete only the VPCs created (and related resources) created by the e2e test run.

Looking forward for the part 2 ;-)

\o/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Issues related to CI workflows provider/aws Issues related to AWS CAA provider
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants