-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tracking, Green Reviews WG] Design Green Reviews WG pipeline workflow #182
Comments
I have few comments/questions: Here it is the google doc tracking the implementation details and related discussion: https://docs.google.com/document/d/19fzZW-IMv2kDNatKFHeHh7wqcEN0e2N60wzxvCGZd48/edit?usp=sharing Questions: In this [link](In this link I have seen some steps to trigger GH actions between different repos. Is it a valid approach?) I have seen some steps to trigger GH actions between different repos. Is it a valid approach? Could you please clarify what we would need k6 for? I am not an expert and it is not so clear to me :) Does Prometheus handle the remote write? Where can I read some doc on how? Proposals: We would maybe need to add a step which: "install the project: we would like to test. Can't wait to start! |
This is the way to go, yes. The workflow to build and deploy the CNCF project should trigger the GH Action workflow in the green-reviews-tooling repo. To achieve this, we need to add a workflow_dispatch in the CNCF project build pipeline (assuming they use GH Action). Then, we need to add a a repository_dispatch in the tooling repo. This will be the trigger for the load tests.
These should be benchmark tests that perform an action against the CNCF Project. To use an example more familiar to me, in the example of a GitOps benchmark test, we create and deploy an application which Flux/ArgoCD then reconcile in the cluster. This would be the equivalent to the SCI Functional Unit, in theory:
This will look different for each CNCF Project. Unsure what this looks like for Falco - it is what we are trying to figure out next :) In this scenario, the benchmark tests run in the same Node as Kepler is running, so that we can measure the energy consumption of the Functional Unit.
devstats is a Grafana dashboard so we should be able to create a Prometheus data source for it. The Grafana instance will read from the Prometheus instance running on our Node (which contains the Kepler metrics). TBD :) |
@nikimanoledaki, @AntonioDiTuri the Falco project tests their deployments on Equinix using this Ansible config https://github.com/falcosecurity/kernel-testing/tree/main. Perhaps this could be a good starting point for us? We could copy the resources, deploying the setup (for testing purposes, -> running the Falco tests) then reducing the deployment (remove all settings we do not need) and adding new configuration (to enable assessing the SCI score, measuring energy etc.). We likely need to build a small project (in Go or so) next to the Ansible config to make some calculations. cc @incertum |
@leonardpahlke, our Falco core maintainers meeting is on October 5th. Will check with the other maintainers (@FedeDP, @Andreagit97) to see how we can generalize the Ansible setup so that it can be used for more future projects beyond Falco. Voting in favor of creating a Go project that we can all contribute to. This may come in handy for future setup configurations and other aspects. |
I see that this ticket is referenced: https://www.bbb.org/us/il/aurora/profile/window-installation/green-t-windows-0654-88593900 Does this mean that we plan to run Kubernetes on a bare-metal Equinox cluster, without a hypervisor like EKS? Do we plan to deploy each project directly on the hosts, or in the Falco case as a daemonset?
Please note that the Firecracker VM setup you are referencing is used to test different kernel versions. For the Green Reviews WG efforts, one kernel version is sufficient, and we would like to deploy Falco realistically, as the goal here is currently slightly different: we want to test performance. |
@incertum that is correct, Equinix is our infrastructure so we will be running on a BM cluster without a hypervisor. In the linked issue about the cluster config, Ross and I are evaluating the pros/cons of different tooling and cluster setups. We have not decided on any yet - we're slightly blocked on how the testing will be for each CNCF Project which will determine what the cluster will look like and which tooling is best. We're looking into various CAPI setups but open to Ansible as well. Our only requirement is that we use IaaC/GitOps so that the configuration can live in the dedicated tooling repo: https://github.com/cncf-tags/green-reviews-tooling
This is a great question - we are still in the process of designing the E2E flow. We need to think about how we will scale when we test each project separately, with an emphasis on isolation when we run the tests for each project. This could be done with CAPI by running a new worker cluster for each project/test. However this could be too much overhead. We could also achieve this degree of isolation within a single cluster if we run each CNCF Project and its test on a dedicated Node if we do a combination of Namespace + Node taint/toleration for each project that we test. So we could run Falco as a Daemonset but essentially it would be 1 Pod that runs on 1 Node. Please let me know what you think about all of this! Appreciate your feedback and that this is a WIP with a lot of back-and-forth to make sure we're building the right thing 😊 In order to make a decision about the above it would also help us first to decide what and how we will run the tests for each CNCF Project. With regard to what we test, I agree with @leonardpahlke's idea in the comment above that we could reuse the existing Falco test steps/structure with some changes to fit our use case. We can replicate the test steps even if the tooling is different. With regard to how we run these tests, we should discuss whether we will use the Project's existing test tooling or a unified test tooling. For example, we could use the tooling that each CNCF Project uses, such as Ansible in the case of Falco. On the one hand, we could ask for Project maintainers to maintain this test suite, potentially. On the other hand, we introduce some discrepancy in how we test CNCF Projects if we use a different tool each time. @leonardpahlke curious as to what you have in mind for this part. Alternatively, we would use a unified testing tooling such as k6, which has been brought up before but we are not sure yet that k6 is the right tool for testing Falco and other CNCF Projects. Before we move forward, it would be great to establish whether k6 is a good tool for this. @incertum, do you think it would be possible to collaborate with you and the other Falco maintainers to do a spike on porting the simplest Falco test to k6? @immavalls works with k6 and brought up the following, here:
Could you expand further on what you mean with regard to deploying Falco realistically? Is there something we are missing or should look out for in your experience of how Falco is deployed? |
@incertum I've been looking at https://github.com/falcosecurity/kernel-testing with @nikimanoledaki to see if we can reuse the approach you're using with GitHub Actions and Ansible. I may well be missing something but high level AIUI the test is triggered by Falco's Prow instance which runs the Ansible playbook that connects to an Equinix machine using SSH. Is the Equinix machine running Kubernetes and how do you manage it? I ask because in our case we need to run the tests on a K8s node. This is so Kepler can attribute the energy consumption from the CPU socket to the Falco pods. |
GitHub Actions + Ansible Kubernetes module could be a valid solution. I'd like to clarify that our kernel testing involved spinning up VMs to test different kernels, and yes, SSH was used, along with some other setup requirements. However, for the Green Reviews WG + Falco, VMs and SSH are not necessary. I shared our setup to demonstrate how we successfully established a robust pipeline using GitHub Actions + Ansible against an Equinix machine. My intention was to illustrate that the steps of our pipeline could potentially inspire the Green Reviews workflow. Realizing maybe it complicates the discussion, apology for any confusion caused. |
@incertum No worries and thank you this is very helpful!
We're looking into using Ansible and the Kubernetes module could be useful for our pipeline but good to know other tooling is an option. For the cluster we can use the Equinix Ansible module but we still need to provision K8s. CAPI / CAPEM is a challenge without a management cluster. Kubeadm is also an option but it would be good to see other CNCF projects solve this. |
Closing this issue since we are tracking more specific work in the WG repository: https://github.com/cncf-tags/green-reviews-tooling |
Description
Following from the proposal "Proof of Environmental Sustainability activities and best practices for CNCF projects", the Green Reviews WG is setting up infrastructure to measure the sustainability footprint of CNCF projects. This issue tracks the design & implementation of this technical work.
Please add your suggestions / comments / questions to the Design Doc here! 🌱
Milestones
[Tracking] Epic
Co-Authored By: Kristina D @guidemetothemoon
The text was updated successfully, but these errors were encountered: