Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Tasks to acquire and release boskos resources 🐑 #408

Merged
merged 1 commit into from
Jul 24, 2020

Conversation

bobcatfish
Copy link
Contributor

Changes

Boskos is a tool that allows one to create a pool of cloud projects
(definitely GCP, I think it supports other providers as well), and
manages acquiring, releasing, and cleaning them between leases.

We use it for Tekton test infrastructure for our end to end tests and
we'd like to use it for our catalog Tasks as well.

This commit adds boskos acquire and release Tasks.

The acquire Task also creates a pod in the running cluster to perform
heartbeating so that boskos knows that the resource is still in use.
(Lemme know if you have a better idea for making this work! I have this
slightly crazy idea that Pipeline level sidecars might make sense ....)

The intention of the release Task is that it would be run in a
Pipeline's finally clause, however today that would be difficult
because finally Tasks can't yet use the results of other Tasks, but this
functionality is on the way: tektoncd/pipeline#2557

This is part of the work in #373 to create a Pipeline for the catalog.

Thanks to @ixdy for helping me figure this out! This also might be relevant to your interests @afrittoli

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

See the contribution guide
for more details.

@bobcatfish bobcatfish changed the title Add Tasks to acquire and release boskos Tasks 🐑 Add Tasks to acquire and release boskos resources 🐑 Jul 10, 2020
@tekton-robot tekton-robot requested review from imjasonh and a user July 10, 2020 21:22
@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 10, 2020
@bobcatfish bobcatfish force-pushed the boskos branch 2 times, most recently from 3eff059 to 1817037 Compare July 10, 2020 21:27
@bobcatfish
Copy link
Contributor Author

Anyone who'd like to join the OWNERS files with me lemme know :D

It is implemented using [`boskosctl`](https://github.com/kubernetes-sigs/boskos/tree/master/cmd/boskosctl).

_The Task assumes already have Boskos up and running. To set it up yourself, your
best bet would be to find an example of it already configured, for example
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is now an example deployment in https://github.com/kubernetes-sigs/boskos/tree/master/deployments/overlays/example, but other docs are still a bit lacking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ill use that example instead!

echo $RESOURCE > /workspace/full-resource-output.json
echo $RESOURCE | jq -rj ".name" > /tekton/results/leased-resource
- name: create-boskosctl-heartbeat-pod
image: ubuntu
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ubuntu is a pretty large image, and we only need a very basic shell here.
I wonder if this could be combined with the next step instead?

cat <<EOF > | kubectl apply -f -
      apiVersion: v1
      kind: Pod
      metadata:
        name: boskos-heartbeat-$LEASED_RESOURCE
      spec:
        containers:
        - name: heatbeat
          image: gcr.io/k8s-staging-boskos/boskosctl@sha256:a7fc984732c5dd0b4e0fe0a92e2730fa4b6bddecd0f6f6c7c6b5501abe4ab105
          args:
          - heartbeat
          - --server-url=$(params.server-url)
          - --owner-name=$(params.owner-name)
          - --resource=$FULL_RESOURCE_OUTPUT
          - --period=10s
      EOF

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was gonna comment that same. We can combined the two step into one indeed 👼

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i got really excited about this and then i realized that we'd need a way to run these lines:

      FULL_RESOURCE_OUTPUT=$(cat /workspace/full-resource-output.json)
      LEASED_RESOURCE=$(cat /tekton/results/leased-resource)

I guess if we're running cat in the next step, we'd be able to run cat inline?

alternatively: what about using a smaller image here instead of ubuntu, e.g. busy box?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bobcatfish I don't get what does it change ? writing the file vs piping it to kubectl is doing the same. Instead of writing /workspace/heartbeat.yaml and then running kubectl with it, we pipe what was written in that file directly to kubectl, nothing else changes 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My train of thought was something like:

  1. cool we can just use the kubectl image!
  2. oh wait, what about those values from the files (not in andrea's example)
  3. oh we'll need to run a shell
  4. oh we're already running a shell
  5. i guess the advantage here is that we're running 1 image instead of 2?

I think it's around (3): now we need a shell in the image that runs kubectl and (5) that i get a lot less excited.

it seems clearer to me to have 2 steps, one that makes the file and another that uses the file vs squishing them together for efficiency reasons. so im wondering if using smaller images for both might be another solution?

the kubectl image here already only contains sh and does not contain bash:

(⎈ |euca:default)➜  catalog git:(boskos) ✗ docker run -it --entrypoint /bin/bash lachlanevenson/k8s-kubectl
docker: Error response from daemon: OCI runtime create failed: container_linux.go:346: starting container process caused "exec: \"/bin/bash\": stat /bin/bash: no such file or directory": unknown.
(⎈ |euca:default)➜  catalog git:(boskos) ✗ docker run -it --entrypoint /bin/sh lachlanevenson/k8s-kubectl
~ #

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too bad the kubectl image is 50mb X'D (maybe they should remove sh XD)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i bet kubectl needs sh to run

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would just collapse the two steps, not only because it is slightly more efficient, but also because it would seem more readable to me: 1 step gets the lease from boskos, 1 step starts the heartbeat.
If you prefer having the two separate steps I don't mind.

we're probably just getting into personal preference now but i'll say my final 2 cents - having 2 steps makes sense to me b/c:

1. The step that creates the config needs a shell, so use an image with a shell
2. The step that applies the config doesn't need a shell, it just needs kubectl, so don't use a shell, just use kubectl

I am not actually sure it's too much about personal preferences, but also about user experiences. More step means more containers (if you have things like LimitRange, …, this means more resource reserved), more different images means more images to pull, more image to store on nodes 🙃.

I think our point (andrea and I) is that you don't need to "create" the config (as a file), you can just pipe it to kubectl. And as you said, the kubectl image already have sh, so it's all you need.

Any way, not wanting to block the PR for that 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but also about user experiences. More step means more containers (if you have things like LimitRange, …, this means more resource reserved), more different images means more images to pull, more image to store on nodes

I guess, feels like a strange place to be optimizing to me: I also have a dream of having many tiny images which do one thing well and only that thing, which can be cleanly composed together.

anyway maybe we can revisit it as we go - seems like it's not too bad to have the 2 images instead of one for now, so seems like it's okay to leave it for now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried making my own image for this. However, the only way I was able to getting to work was by making a curl request to the k8s cluster directly. If the k8s api changes than it would have to be updated every time. I tried finding a k8s-cli slim but my google-foo failed me 😭. It would be cool to have a container that had a bunch of kubectl slim that just basic verb/object capabilities. I might even do it as a side-project, tbh.

boskos-acquire/0.1/boskos-acquire.yaml Outdated Show resolved Hide resolved
- --server-url=$(params.server-url)
- --owner-name=$(params.owner-name)
- --resource=$FULL_RESOURCE_OUTPUT
- --period=10s
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self: i made this 10 seconds for testing but we probably dont need it to be that frequent, for example kubetest uses 5m (https://github.com/kubernetes/test-infra/blob/94ee2a70262d1c2f2c878cf4e986b4b41180f5f6/kubetest/main.go#L753-L759) and afaik the default timeout boskos uses is 30m

@bobcatfish
Copy link
Contributor Author

/test pull-tekton-catalog-integration-tests

@bobcatfish
Copy link
Contributor Author

PTAL @vdemeester @afrittoli - i'm still using 2 tasks for writing the file + applying it, but I've changed from ubuntu to bash :O

@@ -0,0 +1 @@
boskos-acquire/0.1/examples/pipelinerun.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please change it to boskos-acquire/0.1/samples/pipelinerun.yaml

@@ -0,0 +1 @@
boskos-acquire/0.1/examples/service-account.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please change it to boskos-acquire/0.1/samples/pipelinerun.yaml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching these @PuneetPunamiya !

@bobcatfish
Copy link
Contributor Author

@popcor255
Copy link
Member

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 22, 2020
bobcatfish added a commit to bobcatfish/catalog that referenced this pull request Jul 22, 2020
As part of tektoncd#373 I'm continuing to create Tasks that we can use to run
tests for items in the catalog. This Task creates a cluster within a GKE
project and is based on the way we create clusters for the existing end
to end tests we have in Tekton, which are themselves based on Knative,
when are THEMselves based on k8s. These tests all use a tool called
kubetest, which is responsible for invoking boskos (see tektoncd#408) and
creating a GKE cluster
(https://github.com/kubernetes/test-infra/tree/master/kubetest).

This Task attempts to create a cluster in the same way, which based on the
output in the end to end logs seems to consist of creating the cluster
and setting up a firewall for it.

I'm not sure if this belongs in the catalog itself since it is specific
to our own end to end tests and isn't as generic as other catalog tests,
so if we want to move this into the test directory that seems fine, but
I also think it could be an okay addition to the catalog on its own (esp
if other k8s based projects that test against GKE want to use it.)
bobcatfish added a commit to bobcatfish/catalog that referenced this pull request Jul 22, 2020
As part of tektoncd#373 I'm continuing to create Tasks that we can use to run
tests for items in the catalog. This Task creates a cluster within a GKE
project and is based on the way we create clusters for the existing end
to end tests we have in Tekton, which are themselves based on Knative,
when are THEMselves based on k8s. These tests all use a tool called
kubetest, which is responsible for invoking boskos (see tektoncd#408) and
creating a GKE cluster
(https://github.com/kubernetes/test-infra/tree/master/kubetest).

This Task attempts to create a cluster in the same way, which based on the
output in the end to end logs seems to consist of creating the cluster
and setting up a firewall for it.

I'm not sure if this belongs in the catalog itself since it is specific
to our own end to end tests and isn't as generic as other catalog tests,
so if we want to move this into the test directory that seems fine, but
I also think it could be an okay addition to the catalog on its own (esp
if other k8s based projects that test against GKE want to use it.)
bobcatfish added a commit to bobcatfish/catalog that referenced this pull request Jul 22, 2020
As part of tektoncd#373 I'm continuing to create Tasks that we can use to run
tests for items in the catalog. This Task creates a cluster within a GKE
project and is based on the way we create clusters for the existing end
to end tests we have in Tekton, which are themselves based on Knative,
when are THEMselves based on k8s. These tests all use a tool called
kubetest, which is responsible for invoking boskos (see tektoncd#408) and
creating a GKE cluster
(https://github.com/kubernetes/test-infra/tree/master/kubetest).

This Task attempts to create a cluster in the same way, which based on the
output in the end to end logs seems to consist of creating the cluster
and setting up a firewall for it.

I'm not sure if this belongs in the catalog itself since it is specific
to our own end to end tests and isn't as generic as other catalog tests,
so if we want to move this into the test directory that seems fine, but
I also think it could be an okay addition to the catalog on its own (esp
if other k8s based projects that test against GKE want to use it.)
bobcatfish added a commit to bobcatfish/catalog that referenced this pull request Jul 22, 2020
As part of tektoncd#373 I'm continuing to create Tasks that we can use to run
tests for items in the catalog. This Task creates a cluster within a GKE
project and is based on the way we create clusters for the existing end
to end tests we have in Tekton, which are themselves based on Knative,
when are THEMselves based on k8s. These tests all use a tool called
kubetest, which is responsible for invoking boskos (see tektoncd#408) and
creating a GKE cluster
(https://github.com/kubernetes/test-infra/tree/master/kubetest).

This Task attempts to create a cluster in the same way, which based on the
output in the end to end logs seems to consist of creating the cluster
and setting up a firewall for it.

I'm not sure if this belongs in the catalog itself since it is specific
to our own end to end tests and isn't as generic as other catalog tests,
so if we want to move this into the test directory that seems fine, but
I also think it could be an okay addition to the catalog on its own (esp
if other k8s based projects that test against GKE want to use it.)
annotations:
tekton.dev/pipelines.minVersion: "0.12.1"
tekton.dev/tags: "boskos,test"
spec:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if you could please add the description in the form of summary and description

For e.g.

spec:
     description: >-
       One line summary of the task.         👈 # Summary

       Description of the task 

tekton.dev/pipelines.minVersion: "0.12.1"
tekton.dev/tags: "boskos,test"
spec:
description: |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if you could please add the description in the form of summary and description

For e.g.

spec:
     description: >-
       One line summary of the task.         👈 # Summary

       Description of the task 

@PuneetPunamiya
Copy link
Member

PuneetPunamiya commented Jul 23, 2020

@bobcatfish The tasks are not in the task directory, can you please move the task in the task directory

--target-state=busy)
echo $RESOURCE > /workspace/full-resource-output.json
echo $RESOURCE | jq -rj ".name" > /tekton/results/leased-resource
- name: create-heartbeat-pod-yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am gonna try one last time 😝 . I don't see why the step below is worse than the 2 steps proposed 🙃

  - name: create-heartbeat-pod-yaml
    image: lachlanevenson/k8s-kubectl
    script: |
      FULL_RESOURCE_OUTPUT=$(cat /workspace/full-resource-output.json)
      LEASED_RESOURCE=$(cat /tekton/results/leased-resource)
      cat <<EOF | kubectl apply -f -
      apiVersion: v1
      kind: Pod
      metadata:
        name: boskos-heartbeat-$LEASED_RESOURCE
      spec:
        containers:
        - name: heatbeat
          image: gcr.io/k8s-staging-boskos/boskosctl@sha256:a7fc984732c5dd0b4e0fe0a92e2730fa4b6bddecd0f6f6c7c6b5501abe4ab105
          args:
          - heartbeat
          - --server-url=$(params.server-url)
          - --owner-name=$(params.owner-name)
          - --resource=$FULL_RESOURCE_OUTPUT
          - --period=5m
      EOF

You don't need to write that yaml to /workspace/heartbeat.yaml ever, it's not gonna be used 🙃

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, because we are using :latest tag on the steps, k8s will pull the image each time, so 2 steps is gonna cost more than one 🙃 (if we do use digest or tags for images, that's another story)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, seems like you feel strongly enough that I'll combine the two steps into one. I've created #441 to continue the discussion

btw, because we are using :latest tag on the steps, k8s will pull the image each time, so 2 steps is gonna cost more than one 🙃 (if we do use digest or tags for images, that's another story)

Oh good point, ill add the digest (I think this is a best practice we should recommend for all catalog tasks?)

(also it depends on the image pull policy right?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also it depends on the image pull policy right?)

So, I think, whatever the pull policy is, using :latest force kubernetes to pull the image each time (because it has no way to know which "latest" you wanted 😝 )

Copy link
Contributor Author

@bobcatfish bobcatfish Jul 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure?

https://kubernetes.io/docs/concepts/configuration/overview/#container-images
https://kubernetes.io/docs/concepts/containers/images/#updating-images

It sounds like that might only be in the case of having no imagePullPolicy?

imagePullPolicy is omitted and either the image tag is :latest or it is omitted: Always is applied.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, with IfNotPresent or Never it won't try to pull it (if not present for the first one). That said, in the task itself, there is no imagePullPolicy so we can "only" assume the default behavior. Overall, the more the catalog grow, the more we should try to follow our recommendation https://github.com/tektoncd/catalog/blob/master/recommendations.md#reference-images-by-digest 🙃 @sthaha validation tool could help there (to warn of tasks that uses latest)

Copy link
Member

@chmouel chmouel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, although if we can address Puneet and Vincent suggestions it would be IMO good to go!

side note, since i am not too familiar with boskos, there is no way to have the Pod/heartbeat as a task? does the pod need to be keep hearbeating for a while and due cannot be 'plugged' directly in a step?

@bobcatfish
Copy link
Contributor Author

here is no way to have the Pod/heartbeat as a task? does the pod need to be keep hearbeating for a while and due cannot be 'plugged' directly in a step?

Exactly - it needs to run the entire time the project is in use. Otherwise, after some period of time (I think it's 30 min by default) boskos will assume the project isn't being used and will clean it and put it back into its pool of available projects. I've also created tektoncd/pipeline#2973 as a possible alternative approach.

@tekton-robot tekton-robot added do-not-merge/invalid-owners-file Indicates that a PR should not merge because it has an invalid OWNERS file in it. and removed lgtm Indicates that a PR is ready to be merged. labels Jul 23, 2020
Boskos is a tool that allows one to create a pool of cloud projects
(definitely GCP, I think it supports other providers as well), and
manages acquiring, releasing, and cleaning them between leases.

We use it for Tekton test infrastructure for our end to end tests and
we'd like to use it for our catalog Tasks as well.

This commit adds boskos acquire and release Tasks.

The acquire Task also creates a pod in the running cluster to perform
heartbeating so that boskos knows that the resource is still in use.

The intention of the release Task is that it would be run in a
Pipeline's `finally` clause, however today that would be difficult
because finally Tasks can't yet use the results of other Tasks, but this
functionality is on the way: tektoncd/pipeline#2557

This is part of the work in tektoncd#373 to create a Pipeline for the catalog.
@tekton-robot tekton-robot removed the do-not-merge/invalid-owners-file Indicates that a PR should not merge because it has an invalid OWNERS file in it. label Jul 23, 2020
@bobcatfish
Copy link
Contributor Author

Should be ready for another look @vdemeester @afrittoli @PuneetPunamiya @popcor255 @sthaha @chmouel !

bobcatfish added a commit to bobcatfish/catalog that referenced this pull request Jul 23, 2020
As part of tektoncd#373 I'm continuing to create Tasks that we can use to run
tests for items in the catalog. This Task creates a cluster within a GKE
project and is based on the way we create clusters for the existing end
to end tests we have in Tekton, which are themselves based on Knative,
when are THEMselves based on k8s. These tests all use a tool called
kubetest, which is responsible for invoking boskos (see tektoncd#408) and
creating a GKE cluster
(https://github.com/kubernetes/test-infra/tree/master/kubetest).

This Task attempts to create a cluster in the same way, which based on the
output in the end to end logs seems to consist of creating the cluster
and setting up a firewall for it.

I'm not sure if this belongs in the catalog itself since it is specific
to our own end to end tests and isn't as generic as other catalog tests,
so if we want to move this into the test directory that seems fine, but
I also think it could be an okay addition to the catalog on its own (esp
if other k8s based projects that test against GKE want to use it.)
Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/meow

@tekton-robot
Copy link

@vdemeester: cat image

In response to this:

/meow

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 24, 2020
tekton-robot pushed a commit that referenced this pull request Jul 24, 2020
As part of #373 I'm continuing to create Tasks that we can use to run
tests for items in the catalog. This Task creates a cluster within a GKE
project and is based on the way we create clusters for the existing end
to end tests we have in Tekton, which are themselves based on Knative,
when are THEMselves based on k8s. These tests all use a tool called
kubetest, which is responsible for invoking boskos (see #408) and
creating a GKE cluster
(https://github.com/kubernetes/test-infra/tree/master/kubetest).

This Task attempts to create a cluster in the same way, which based on the
output in the end to end logs seems to consist of creating the cluster
and setting up a firewall for it.

I'm not sure if this belongs in the catalog itself since it is specific
to our own end to end tests and isn't as generic as other catalog tests,
so if we want to move this into the test directory that seems fine, but
I also think it could be an okay addition to the catalog on its own (esp
if other k8s based projects that test against GKE want to use it.)
@chmouel
Copy link
Member

chmouel commented Jul 24, 2020

/lgtm

Looking Good! 🤙🏽

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 24, 2020
@tekton-robot tekton-robot merged commit e990604 into tektoncd:master Jul 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants