Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sharing artifacts between pipeline tasks using bucket #444

Merged
merged 3 commits into from
Jan 31, 2019

Conversation

nader-ziada
Copy link
Member

Fixes #384

Proposed changes

  • Using gcp buckets only for now
  • Bucket information configured using a config map
  • Refactor of pvc implementation for the same feature to use same interface
  • Artifact bucket and artifact pvc return the container spec to execute the upload and download steps
  • e2e test will only run if there is an environment variable KANIKO_SECRET_CONFIG_FILE with the path to a service json file that has enough access to create a bucket

Still to be implemented

The recommendation is to use a retention policy to get the files deleted after a fixed small number of hours which is more efficient then deleting a folder with a large number of files

@knative-prow-robot knative-prow-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 28, 2019
@googlebot googlebot added the cla: yes Trying to make the CLA bot happy with ppl from different companies work on one commit label Jan 28, 2019
@shashwathi
Copy link
Contributor

/test pull-knative-build-pipeline-build-tests

@nader-ziada
Copy link
Member Author

/test pull-knative-build-pipeline-unit-tests

@nader-ziada
Copy link
Member Author

/test pull-knative-build-pipeline-integration-tests

1 similar comment
@nader-ziada
Copy link
Member Author

/test pull-knative-build-pipeline-integration-tests

@nader-ziada
Copy link
Member Author

/retest

@nader-ziada nader-ziada force-pushed the bucket branch 4 times, most recently from 73b06a1 to 0d2385f Compare January 29, 2019 19:08
Copy link
Contributor

@shashwathi shashwathi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @pivotal-nader-ziada . Great PR.
We did some initial review and left some minor comments. I will take further look at this again tomorrow. 👍

}

// NewArtifactBucketConfigFromConfigMap creates a Bucket from the supplied ConfigMap
func NewArtifactBucketConfigFromConfigMap(configMap *corev1.ConfigMap) (*v1alpha1.ArtifactBucket, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no path that returns error in this function so this function declaration can be updated to

func NewArtifactBucketConfigFromConfigMap(configMap *corev1.ConfigMap) *v1alpha1.ArtifactBucket {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it has to implement interface definition in pkg

@nader-ziada
Copy link
Member Author

@shashwathi @dlorenc comments addressed, Thank you for the review

Copy link
Contributor

@shashwathi shashwathi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor comments in test. Rest looks great. 👍

}

func NewStore(logger configmap.Logger) *Store {
store := &Store{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can return directly without assigning to store variable here.

}
}

func TestAddStepsToBuild_WithBucketFromConfigMap(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this test we can simplify by only testing build spec instead of build. I don't see build metadata being changed here. Also wantErr is not used in this test so we can remove that too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test updated

test/README.md Outdated
@@ -115,6 +115,8 @@ permissions inside the TaskRun to run the Kaniko e2e test and GCS taskrun test.
`KANIKO_SECRET_CONFIG_FILE` is used to generate Kubernetes secret to access
GCS bucket. This e2e test requires valid service account configuration json
but it does not require any role binding.
- In Storage artifact bucket, GCP service account JSON key file at path
`KANIKO_SECRET_CONFIG_FILE` is used to create/delete a bucket.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to refactor/reword this paragraph. It might not be in the scope of this PR. We can leave TODO here. We need to break down "GCP serviceaccount JSON key file" . One suggestion would be the following

In the Storage artifact bucket test, the JSON key file for the GCP service account stored at path KANIKO_SECRET_CONFIG_FILE is used to create/delete a GCS bucket.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that we're using KANIKO_SECRET_CONFIG_FILE for the bucket test as well, would it make sense to rename KANIKO_SECRET_CONFIG_FILE to something more general?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to GCP_SERVICE_ACCOUNT_KEY_PATH and reworded the readme

v1alpha1.BucketServiceAccountSecretName: bucketSecretName,
v1alpha1.BucketServiceAccountSecretKey: bucketSecretKey,
}
c.KubeClient.UpdateConfigMap(systemNamespace, v1alpha1.BucketConfigName, configMapData)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small doubt. Are we updating config map after e2e test execution to its original state? Sorry if I missed this code

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding login to reset config map to original values

@knative-prow-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pivotal-nader-ziada, shashwathi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [pivotal-nader-ziada,shashwathi]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Collaborator

@bobcatfish bobcatfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!!! Just some minor comments from me and a couple questions:

  • Just curious, did you notice in your experiments, is it faster to use GCS instead of PVCs?
  • looks like there's a bit of coverage missing in gcs_resource.go that might be worth adding?

- bucket.service.account.secret.name: the name of the secret that will contain the credentials for the service account
with access to the bucket
- bucket.service.account.secret.key: the key in the secret with the required service account json
The bucket is configured with a retention policy of 24 hours after which files will be deleted
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we put the docs on how to configure this ConfigMap with these other docs on how to configure the entrypoint configmap?

might even make sense to put in a separate doc e.g. install.md (could be a starting point for #385 :D)

Copy link
Member Author

@nader-ziada nader-ziada Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do the install.md in a separate pr :D

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kk sounds great @pivotal-nader-ziada :D

docs/using.md Outdated
- location: the address of the bucket (for example gs://mybucket)
- bucket.service.account.secret.name: the name of the secret that will contain the credentials for the service account
with access to the bucket
- bucket.service.account.secret.key: the key in the secret with the required service account json
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be a bit easier to read with `` around the values, e.g.:

bucket.service.account.secret.key: the key in the secret with the required service account json

limitations under the License.
*/

package v1alpha1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about putting the implementations of the artifacts interface into package artifacts ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally had the code in the artifacts package, but had to move it to be able to use existing variables (such as gsutilImage) which are defined in the api package. I would get a circular dependency. This will probably need a bigger refactor which I didn't want to do here since this PR is big already

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kk no worries, hopefully we'll refactor it later :D

GetSecretsVolumes() []corev1.Volume
GetType() string
StorageBasePath(pr *v1alpha1.PipelineRun) string
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!!!!!! i like the interface :D

test/README.md Outdated
@@ -115,6 +115,8 @@ permissions inside the TaskRun to run the Kaniko e2e test and GCS taskrun test.
`KANIKO_SECRET_CONFIG_FILE` is used to generate Kubernetes secret to access
GCS bucket. This e2e test requires valid service account configuration json
but it does not require any role binding.
- In Storage artifact bucket, GCP service account JSON key file at path
`KANIKO_SECRET_CONFIG_FILE` is used to create/delete a bucket.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that we're using KANIKO_SECRET_CONFIG_FILE for the bucket test as well, would it make sense to rename KANIKO_SECRET_CONFIG_FILE to something more general?

// TestStorageBucketPipelineRun is an integration test that will verify a pipeline
// can use a bucket for temporary storage of artifacts shared between tasks
func TestStorageBucketPipelineRun(t *testing.T) {
configFilePath := os.Getenv("KANIKO_SECRET_CONFIG_FILE")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick question: do we know if KANIKO_SECRET_CONFIG_FILE is defined in our automated tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the user that is used to run the e2e tests doesn't have high permissions, so it would be able to execute the gcs and artifact_bucket tests anyways

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kk!

- using gcp buckets only for now
- bucket information configured using a config map
- refactor of pvc implementation for the same feature to use same interface
- artifact bucket and artifact pvc return the container spec to execute the upload and download steps
- issue tektoncd#384
- use getType on artifactStorageInterface instead of isPVC
- refactor go files
- add unit tests
Copy link
Contributor

@shashwathi shashwathi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for addressing comments. It looks good from my end.

@shashwathi
Copy link
Contributor

/assign @bobcatfish

@nader-ziada
Copy link
Member Author

@bobcatfish On GKE, the pvc is pretty fast, even faster if its a huge number of files

@knative-metrics-robot
Copy link

The following is the coverage report on pkg/.
Say /test pull-knative-build-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/pipeline/v1alpha1/artifact_bucket.go Do not exist 83.3%
pkg/apis/pipeline/v1alpha1/artifact_pvc.go Do not exist 71.4%
pkg/apis/pipeline/v1alpha1/gcs_resource.go 87.0% 82.9% -4.1
pkg/apis/pipeline/v1alpha1/pipelinerun_types.go 100.0% 90.0% -10.0
pkg/apis/pipeline/v1alpha1/secret_volume_mount.go Do not exist 100.0%
pkg/apis/pipeline/v1alpha1/taskrun_types.go 53.8% 64.7% 10.9
pkg/artifacts/artifacts_storage.go Do not exist 87.0%
pkg/reconciler/v1alpha1/pipelinerun/config/store.go Do not exist 87.5%
pkg/reconciler/v1alpha1/pipelinerun/pipelinerun.go 82.6% 84.8% 2.2
pkg/reconciler/v1alpha1/taskrun/resources/input_resources.go 93.3% 92.9% -0.5
pkg/reconciler/v1alpha1/taskrun/resources/output_resource.go 94.8% 93.8% -1.1

@@ -136,7 +139,7 @@ gcloud projects add-iam-policy-binding $PROJECT_ID --member serviceAccount:$EMAI
# create the JSON key
gcloud iam service-accounts keys create config.json --iam-account $EMAIL

export KANIKO_SECRET_CONFIG_FILE="$PWD/config.json"
export GCP_SERVICE_ACCOUNT_KEY_PATH="$PWD/config.json"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice change :D

@bobcatfish
Copy link
Collaborator

whew, awesome work @pivotal-nader-ziada !! Looks great :D

/lgtm
/meow space

@knative-prow-robot
Copy link

@bobcatfish: cat image

In response to this:

whew, awesome work @pivotal-nader-ziada !! Looks great :D

/lgtm
/meow space

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 31, 2019
@knative-prow-robot knative-prow-robot merged commit 3cd318b into tektoncd:master Jan 31, 2019
@nader-ziada nader-ziada deleted the bucket branch February 13, 2019 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cla: yes Trying to make the CLA bot happy with ppl from different companies work on one commit lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use storage bucket to link outputs and inputs of tasks
7 participants