Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Sample] Demonstrate Continuous Integration #2784

Closed
wants to merge 44 commits into from

Conversation

dldaisy
Copy link
Contributor

@dldaisy dldaisy commented Dec 30, 2019

Changes:

  • Several samples on CI with versioned pipelines, including:
    • helloworld-ci-sample, using cloudbuild for CI, curl for creating pipeline version and run
    • jenkins-ci-sample, using jenkins for CI, curl for creating pipeline version and run
    • mnist-ci-sample, using cloudbuild for CI, sdk client for creating pipeline version and run.
    • kaggle-ci-sample, using cloudbuild for CI, sdk client for creating pipeline version and run. Use kaggle python API to download and submit data.
  • Test of sdk client creating a batch of pipelines and versions

This change is Reviewable


## Overview

This sample use cloud build to implement the continuous integration process of a simple pipeline that outputs "hello world" to the console. Once all set up, you can push your code to github repo, then the build process in cloud build will be triggered automatically, then a run will be created in kubeflow pipeline. You can view your pipeline and the run in kubeflow pipelines.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This sample use cloud build to implement the continuous integration process of a simple pipeline that outputs "hello world" to the console. Once all set up, you can push your code to github repo, then the build process in cloud build will be triggered automatically, then a run will be created in kubeflow pipeline. You can view your pipeline and the run in kubeflow pipelines.
This sample uses cloud build to implement the continuous integration process of a simple pipeline that outputs "hello world" to the console. Once all set up, you can push your code to github repo, then the build process in cloud build will be triggered automatically, then a run will be created in kubeflow pipeline. You can view your pipeline and the run in kubeflow pipelines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--commit_id', help='Commit Id', type=str)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe elaborate more in the help field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

train = dsl.ContainerOp(
name='mnist train',
image = os.path.join(gcr_address, 'mnist_train:', args.commit_id)
).apply(use_gcp_secret('user-gcp-sa'))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is user-gcp-sa secret still valid under workload identity based deployment? @Bobgy
I remember the usage of user-gcp-sa in samples has been cleaned up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it won't be used.
Recommend removing its usage and put a link to https://www.kubeflow.org/docs/gke/authentication-pipelines/ nearby about how to authenticate to GCP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it also mean that we don't need to create a 'user-gcp-sa' secret volumn mounted on the cluster anymore?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we won't need to. Use workload identity if we care about security, and use full scope cluster for convenience.

@jingzhang36
Copy link
Contributor

/assign @jingzhang36
/assign @gaoning777
/assign @numerology

parser = argparse.ArgumentParser()
parser.add_argument('--version_name', help='Required. Name of the new version. Must be unique.', type=str)
parser.add_argument('--package_url', help='Required. pipeline package url', type=str)
parser.add_argument('--pipeline_id', help = 'Required. pipeline id',type=str)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consistent style.

Suggested change
parser.add_argument('--pipeline_id', help = 'Required. pipeline id',type=str)
parser.add_argument('--pipeline_id', help='Required. pipeline id', type=str)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

else:
client = kfp.Client()

print('your client configuration is :{}'.format(client.pipelines.api_client.configuration.__dict__))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print('your client configuration is :{}'.format(client.pipelines.api_client.configuration.__dict__))
print('Your client configuration is: {}'.format(client.pipelines.api_client.configuration.__dict__))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks


print('your client configuration is :{}'.format(client.pipelines.api_client.configuration.__dict__))
print('Now in create_pipeline_version_and_run.py...')
print('your api_client host is:')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe also use .format() as above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will modify this part.

gcr_address: str
):
import os
train = dsl.ContainerOp(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try using an actual component instead of ad-hoc ContainerOp:

component.yaml:

name: Train on MNIST
implementation:
  container:
    image: helloworld-ci

and then

train_op = kfp.components.load_component_from_file('component.yaml')
...
train = train_op()

in cloudbuild.yaml we can replace the image name in the component.yaml with the newly-build image:

sed "s|image: helloworld-ci|image: ${_GCR_PATH}/helloworld-ci:$COMMIT_SHA|"

bucket_name: str
):
import os
stepDownloadData = dsl.ContainerOp(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Let's componentize all these ops.
Each component will go into its own component.yaml file and the build script will replace the image versions inside those files, not in the pipeline.py.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Will do.

## Usage

* Substitute the constants in cloudbuild.yaml
* Fill in your kaggle_username and kaggle_key in Dockerfiles to authenticate to kaggle. You can get them from an API token created from your kaggle account page.
Copy link
Contributor

@jingzhang36 jingzhang36 Jan 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably specify which Dockerfile, e.g., Dockerfile under download_dataset, since there are multiple Dockerfiles in kaggle-ci-sample
Or if this replacement is needed in all Dockerfiles, please also point it out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.



substitutions:
_CODE_PATH: /workspace/kaggle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the _CODE_PATH here should be something like "/workspace/samples/contrib/versioned-pipeline-ci-samples/kaggle-ci-sample" to match dockerfile locations. Please fix this one and other _CODE_PATH accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Will modify them.

test_blob.upload_from_filename('test.csv')

with open('train.txt', 'w') as f:
f.write('gs://'+bucket_name+'/train.csv')
Copy link
Contributor

@jingzhang36 jingzhang36 Jan 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want those datas sets in a public available place instead of asking users to prepare the data sets. E.g., gs://ml-pipeline-playground/shakespeare1.txt in pipeline "[Sample] Basic - Parallel execution". This way, we don't need the download data step?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this sample maybe a little different from other samples. If put the data in a public gs bucket, user may not get the full process of completing a kaggle competition, from download, train, to submit.

"${_CODE_PATH}/download_dataset/Dockerfile",
]
id: "BuildDownloadDataImage"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to push image after we build it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the final step will do.

df_train = pd.read_csv(train_file_path)
sns.set()
cols = ['SalePrice', 'OverallQual', 'GrLivArea', 'GarageCars', 'TotalBsmtSF', 'FullBath', 'YearBuilt']
sns.pairplot(df_train[cols], size = 2.5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it seems that the latest interface uses 'height' instead of 'size'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is. Will modify it.

@dldaisy dldaisy changed the title Ci samples [Sample] Demonstrate Continuous Integration Jan 8, 2020
@jingzhang36
Copy link
Contributor

Also, would it be better to break this pr into several smaller ones? E.g., each example has its own PR? The review and merge could be easier and faster that way.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign gaoning777
You can assign the PR to them by writing /assign @gaoning777 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dldaisy
Copy link
Contributor Author

dldaisy commented Jan 12, 2020

ok. I think I can break them to small pieces of pr.

parser.add_argument('--package_url', help='Required. pipeline package url', type=str)
parser.add_argument('--pipeline_id', help = 'Required. pipeline id',type=str)
parser.add_argument('--gcr_address', help='Required. Your cloud registry path. For example, gcr.io/my-project', type=str)
parser.add_argument('--host', help='Host address of kfp.Client. Will be get from cluster automatically, type=str, default='')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please forgive comments from passerby, I noticed this arg is missing a closing '.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reply! Will modify it in new pull request.

This was referenced Feb 7, 2020
@k8s-ci-robot
Copy link
Contributor

@dldaisy: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
kubeflow-pipeline-sample-test ab6faad link /test kubeflow-pipeline-sample-test
kubeflow-pipeline-upgrade-test ab6faad link /test kubeflow-pipeline-upgrade-test

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@stale
Copy link

stale bot commented Jun 24, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jun 24, 2020
@Bobgy Bobgy closed this Jun 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale The issue / pull request is stale, any activities remove this label. ok-to-test size/XXL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants