-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Sample] Demonstrate Continuous Integration #2784
Changes from all commits
556ace1
f9259bf
8d541fb
e0d2f4b
c89a22b
1bf624a
301ec41
07ba4fc
68ae3bf
45af753
324bc83
81dab3b
9578720
48bffbc
65ff6bb
df9c9fa
9a22416
d896c4c
13b5cf6
c05c1ee
6bfb09a
65b1d4a
e8f6644
5bed8d8
2900699
a8308c7
d5a2e15
5c5687a
0042e1f
a6374ae
e3ee9f8
8801bb2
4bf953a
34e26ed
fec3711
04048ac
f0e1790
0e6adbf
87de168
d52c934
ced2173
c833b21
70bb0dd
ab6faad
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# Samples for KFP CI | ||
|
||
## Overview | ||
This is a collection of kubeflow pipeline samples to demonstrate CI. Two tools are manipulated to implement the continuous integration process: cloud build and jenkins. We also demonstrate two ways to interact with kfp: kfp sdk and REST API(curl). | ||
|
||
This repo also includes a test to test the sdk client. It test the sdk API by create several pipelines and versions as indicated. You can use it with command line. | ||
|
||
## Usage | ||
For more concrete instructions, check the READMEs in the subdirectories. |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,15 @@ | ||||||
# Hello World CI Sample | ||||||
|
||||||
## Overview | ||||||
|
||||||
This sample use cloud build to implement the continuous integration process of a simple pipeline that outputs "hello world" to the console. Once all set up, you can push your code to github repo, then the build process in cloud build will be triggered automatically, then a run will be created in kubeflow pipeline. You can view your pipeline and the run in kubeflow pipelines. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks |
||||||
|
||||||
Besides, we use REST API to call kubeflow pipeline to create a new version and a run. Other methods to create pipeline version can be found in mnist sample in this repo, i.e., use kfp SDK. | ||||||
|
||||||
## Usage | ||||||
|
||||||
To use this pipeline, you need to: | ||||||
|
||||||
* Set up a trigger in cloud build that connects to your github repo. | ||||||
* Replace the constants to your own configuration in cloudbuild.yaml | ||||||
* Replace images in the pipeline.py to your own images (the ones you built in cloudbuild.yaml) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
steps: | ||
- name: "gcr.io/cloud-builders/docker" | ||
args: | ||
[ | ||
"build", | ||
"-t", | ||
"${_GCR_PATH}/helloworld-ci:$COMMIT_SHA", | ||
"-t", | ||
"${_GCR_PATH}/helloworld-ci:latest", | ||
"--cache-from", | ||
"${_GCR_PATH}/helloworld-ci:latest", | ||
"${_CODE_PATH}/helloworld", | ||
] | ||
id: "BuildImages" | ||
- name: "python:3.7-slim" | ||
entrypoint: "/bin/sh" | ||
args: [ | ||
"-c", | ||
"cd ${_CODE_PATH}; | ||
pip3 install cffi==1.12.3 --upgrade; | ||
pip3 install kfp; | ||
python pipeline.py --commit_id $COMMIT_SHA; | ||
cp pipeline.py.zip /workspace/pipeline.zip", | ||
] | ||
id: "PackagePipeline" | ||
|
||
- name: "gcr.io/cloud-builders/gsutil" | ||
args: | ||
[ | ||
"cp", | ||
"/workspace/pipeline.zip", | ||
"${_GS_BUCKET}/$COMMIT_SHA/pipeline.zip", | ||
] | ||
id: "UploadPipeline" | ||
waitFor: ["PackagePipeline"] | ||
|
||
|
||
- name: "gcr.io/cloud-builders/curl" | ||
entrypoint: "/bin/sh" | ||
args: | ||
[ | ||
"-c", | ||
"curl.bash $COMMIT_SHA ${_PIPELINE_ID} ${_GS_BUCKET} ${_PIPELINE_ENDPOINT} ${_GCR_PATH}" | ||
] | ||
id: "CreatePipelineVersionAndRun" | ||
|
||
images: | ||
- "${_GCR_PATH}/helloworld-ci:$COMMIT_SHA" | ||
- "${_GCR_PATH}/helloworld-ci:latest" | ||
|
||
substitutions: | ||
_GCR_PATH: [Your cloud registry path. For example, gcr.io/myproject] | ||
_CODE_PATH: /workspace/hello-world | ||
_NAMESPACE: kubeflow | ||
_PIPELINE_ID: [Your kubeflow pipeline id to create a version on. Get it from kfp UI.] | ||
_GS_BUCKET: [Name of your cloud storage bucket. For example, 'gs://my-bucket'] | ||
_PIPELINE_ENDPOINT: [Your exposed pipeline endpoint.] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
#!/bin/bash | ||
|
||
bucket_name=$(echo $3 | sed 's/gs:\/\///') | ||
data='{"name":'\""ci-$1"\"', "code_source_url": "https://github.com/kubeflow/pipelines/tree/'"$1"'", "package_url": {"pipeline_url": "https://storage.googleapis.com/'"$bucket_name"'/'"$1"'/pipeline.zip"}, | ||
"resource_references": [{"key": {"id": '\""$2"\"', "type":3}, "relationship":1}]}' | ||
|
||
version=$(curl -H "Content-Type: application/json" -X POST -d "$data" "$4"/apis/v1beta1/pipeline_versions | jq -r ".id") | ||
|
||
# create run | ||
rundata='{"name":'\""$1-run"\"', | ||
"resource_references": [{"key": {"id": '\""$version"\"', "type":4}, "relationship":2}], | ||
"pipeline_spec":{"parameters": [{"name": "gcr_address", "value": '\""$5"\"'}]}' | ||
echo "$rundata" | ||
curl -H "Content-Type: application/json" -X POST -d "$rundata" "$4"/apis/v1beta1/runs |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
FROM python:3 | ||
|
||
COPY helloworld.py . | ||
|
||
CMD ["python", "./helloworld.py"] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
#!/usr/bin/python | ||
|
||
|
||
def main(): | ||
print("goodbye world!!") | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
#!/usr/bin/env python3 | ||
# Copyright 2019 Google LLC | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
|
||
import kfp.dsl as dsl | ||
from kfp.gcp import use_gcp_secret | ||
import argparse | ||
|
||
parser = argparse.ArgumentParser() | ||
parser.add_argument('--commit_id', help='Commit Id', type=str) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe elaborate more in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok |
||
args = parser.parse_args() | ||
|
||
|
||
@dsl.pipeline( | ||
name='Mnist Sample', | ||
description='Normal sample to demonstrate how to use CI with KFP' | ||
) | ||
def helloworld_ci_pipeline( | ||
gcr_address: str | ||
): | ||
import os | ||
train = dsl.ContainerOp( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's try using an actual component instead of ad-hoc ContainerOp: component.yaml: name: Train on MNIST
implementation:
container:
image: helloworld-ci and then train_op = kfp.components.load_component_from_file('component.yaml')
...
train = train_op() in cloudbuild.yaml we can replace the image name in the
|
||
name='mnist train', | ||
image = os.path.join(gcr_address, 'mnist_train:', args.commit_id) | ||
).apply(use_gcp_secret('user-gcp-sa')) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, it won't be used. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will do. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does it also mean that we don't need to create a 'user-gcp-sa' secret volumn mounted on the cluster anymore? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, we won't need to. Use workload identity if we care about security, and use full scope cluster for convenience. |
||
|
||
|
||
if __name__ == '__main__': | ||
import kfp.compiler as compiler | ||
compiler.Compiler().compile(helloworld_ci_pipeline, __file__ + '.zip') |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# Jenkins CI Sample | ||
|
||
## Overview | ||
This sample use Jenkins to implement continuous integration for a simple pipeline printing "hello world" to console. | ||
This sample use curl to interact with kubeflow pipeline. An Alternative to use sdk to can be found in the mnist sample. | ||
|
||
## Usage | ||
To use this sample, you need to: | ||
* Deploy kubeflow pipeline on GCP. | ||
* Expose ml-pipeline in your workloads after deploying kubeflow pipeline. | ||
* Create your gs bucket, and set it public | ||
* Replace the constants in jenkinsfile to your own configuration according to the instructions in jenkinsfile. | ||
* Deploy Jenkins on your machine or cloud | ||
* Set up a Jenkins pipeline with the jenkins file in the folder. | ||
* Connect Jenkins to your github repo |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
FROM python:3 | ||
|
||
COPY helloworld.py . | ||
|
||
CMD ["python", "./helloworld.py"] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
#!/usr/bin/python | ||
|
||
|
||
def main(): | ||
print("hello world!") | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
node { | ||
def pipeline_id="[Your pipeline id]" | ||
def pipeline_endpoint = "[Your pipeline endpoint]" | ||
def gs_bucket = "[Your gs bucket name, for example, gs://my-bucket]" | ||
|
||
// build hello-world image | ||
stage("BuildImages"){ | ||
// step1: specify source | ||
checkout scm | ||
def imagename = "helloworld-ci:${env.BUILD_ID}" | ||
// build image | ||
def image = docker.build(imagename, "./helloworld") | ||
} | ||
|
||
// package pipeline | ||
stage("PackagePipeline"){ | ||
withPythonEnv('python3'){ | ||
sh ''' | ||
pip3 install cffi==1.12.3 --upgrade; | ||
pip3 install kfp; | ||
python pipeline.py --commit_id $scm.GIT_COMMIT; | ||
''' | ||
} | ||
} | ||
|
||
// upload pipeline to some storage so that it can be accessed by kubeflow pipeline CreatePipelineVersion API | ||
// in this example, we set up a local http server to expose jenkins workspace to kfp API | ||
stage("UploadPipeline"){ | ||
//copy pipeline.py.zip to a storage without access control | ||
sh""" | ||
gsutil cp ./pipeline.py.zip $gs_bucket/$scm.GIT_COMMIT/pipeline.zip | ||
""" | ||
} | ||
|
||
// create pipeline version and a new run | ||
stage("CreatePipelineVersionAndRun"){ | ||
def version_name = "jenkins-ci-$scm.GIT_COMMIT" | ||
def run_name = "$scm.GIT_COMMIT-run" | ||
|
||
|
||
data = sh(script: """echo '{"name": "$version_name", "package_url": {"pipeline_url": "https://storage.googleapis.com/test-pipeline-version/$scm.GIT_COMMIT/pipeline.zip"}, "resource_references": [{"key": {"id": "$pipeline_id", "type":3}, "relationship":1}]}';""", returnStdout: true).trim() | ||
//echo "data is: $data" | ||
version=sh(script: """curl -H "Content-Type: application/json" -X POST -d '$data' "$pipeline_endpoint"/apis/v1beta1/pipeline_versions | jq -r ".id";""", returnStdout: true).trim() | ||
rundata=sh(script: """echo '{"name": "$run_name", "resource_references": [{"key": {"id": "$version", "type":4}, "relationship":2}]}';""", returnStdout: true).trim() | ||
//echo "run data is: $rundata" | ||
sh(script: """curl -H "Content-Type: application/json" -X POST -d '$rundata' "$pipeline_endpoint"/apis/v1beta1/runs""") | ||
|
||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# Kaggle Competition Pipeline Sample | ||
|
||
## Pipeline Overview | ||
|
||
This is a pipeline for [house price prediction](https://www.kaggle.com/c/house-prices-advanced-regression-techniques), an entry-level competition in kaggle. We demonstrate how to complete a kaggle competition by creating a pipeline of steps including downloading data, preprocessing and visualize data, train model and submitting results to kaggle website. | ||
|
||
* We refer to [this notebook](https://www.kaggle.com/rajgupta5/house-price-prediction) and [this notebook](https://www.kaggle.com/neviadomski/how-to-get-to-top-25-with-simple-model-sklearn) in terms of model implementation as well as data visualization. | ||
|
||
* We use [kaggle python api](https://github.com/Kaggle/kaggle-api) to interact with kaggle site, such as downloading data and submiting result. More usage can be found in their documentation. | ||
|
||
* We use [cloud build](https://cloud.google.com/cloud-build/) for CI process. That is, we automatically triggered a build and run as soon as we pushed our code to github repo. You need to setup a trigger on cloud build for your github repo branch to achieve the CI process. | ||
|
||
## Usage | ||
|
||
* Replace the substitutions in cloudbuild.yaml | ||
* Fill in your kaggle_username and kaggle_key in Dockerfiles under these folders: download_dataset, submit_result, to authenticate to kaggle. You can get them from an API token created from your kaggle account page. To be specific: create an API token, find the username and key in the json file. | ||
* Set up cloud build triggers for Continuous Integration. | ||
* Change the images in pipeline.py to the ones you built in cloudbuild.yaml | ||
* Expose your bucket public |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: tests