Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Sample] Demonstrate Continuous Integration #2784

Closed
wants to merge 44 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
556ace1
Support select tensorflow image for tensorboard
dldaisy Nov 29, 2019
f9259bf
modify test for tensorflow version select
dldaisy Nov 29, 2019
8d541fb
delete not available image entry
dldaisy Nov 29, 2019
e0d2f4b
Support tensorflow image selection to run tensorboard
dldaisy Dec 3, 2019
c89a22b
format code with prettier
dldaisy Dec 4, 2019
1bf624a
use HasPrefix instead of regexp
dldaisy Dec 4, 2019
301ec41
delete
dldaisy Dec 6, 2019
07ba4fc
modified tensorboard test
dldaisy Dec 11, 2019
68ae3bf
delete tensorboard
dldaisy Dec 11, 2019
45af753
modify typo
dldaisy Dec 11, 2019
324bc83
test tensorboard
dldaisy Dec 11, 2019
81dab3b
Merge remote-tracking branch 'upstream/master'
dldaisy Dec 11, 2019
9578720
Merge remote-tracking branch 'upstream/master'
dldaisy Dec 11, 2019
48bffbc
tensorboard test
dldaisy Dec 11, 2019
65ff6bb
fuck
dldaisy Dec 11, 2019
df9c9fa
fuck2
dldaisy Dec 11, 2019
9a22416
modify test
dldaisy Dec 17, 2019
d896c4c
merge master
dldaisy Dec 17, 2019
13b5cf6
modify typo in tensorboard hint
dldaisy Dec 17, 2019
c05c1ee
npm run format
dldaisy Dec 18, 2019
6bfb09a
modify tensorboard snapshot
dldaisy Dec 18, 2019
65b1d4a
compatible with previous kfp version. Allow vacant tensorflowImage fi…
dldaisy Dec 19, 2019
e8f6644
add 2 tests for dialog
dldaisy Dec 19, 2019
5bed8d8
modify default tensorflow image to 1.13.2
dldaisy Dec 19, 2019
2900699
merge get version and get tensorboard; let --bind_all support tensorb…
dldaisy Dec 20, 2019
a8308c7
modify reconciler.go
dldaisy Dec 23, 2019
d5a2e15
reconciler rollback
dldaisy Dec 23, 2019
5c5687a
modify corresponding test for --bind_all
dldaisy Dec 23, 2019
0042e1f
modify requested chances 12/23
dldaisy Dec 23, 2019
a6374ae
formControl sorted alphabetically
dldaisy Dec 23, 2019
e3ee9f8
select sorted alphabetically
dldaisy Dec 23, 2019
8801bb2
modify details from PR request 12/24
dldaisy Dec 24, 2019
4bf953a
moidfy format
dldaisy Dec 24, 2019
34e26ed
modify details 12/23
dldaisy Dec 24, 2019
fec3711
modify snapshot
dldaisy Dec 24, 2019
04048ac
retest
dldaisy Dec 24, 2019
f0e1790
retest
dldaisy Dec 24, 2019
0e6adbf
Merge remote-tracking branch 'upstream/master'
dldaisy Dec 27, 2019
87de168
Merge remote-tracking branch 'upstream/master'
Dec 30, 2019
d52c934
add versioned pipeline ci samples
dldaisy Dec 30, 2019
ced2173
add more details in instruction
dldaisy Jan 9, 2020
c833b21
add gs:// prefix to bucket_name; solve bugs in variable name
dldaisy Jan 9, 2020
70bb0dd
modify size to height
dldaisy Jan 10, 2020
ab6faad
modification
dldaisy Jan 13, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions samples/contrib/versioned-pipeline-ci-samples/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Samples for KFP CI

## Overview
This is a collection of kubeflow pipeline samples to demonstrate CI. Two tools are manipulated to implement the continuous integration process: cloud build and jenkins. We also demonstrate two ways to interact with kfp: kfp sdk and REST API(curl).

This repo also includes a test to test the sdk client. It test the sdk API by create several pipelines and versions as indicated. You can use it with command line.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: tests


## Usage
For more concrete instructions, check the READMEs in the subdirectories.
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Hello World CI Sample

## Overview

This sample use cloud build to implement the continuous integration process of a simple pipeline that outputs "hello world" to the console. Once all set up, you can push your code to github repo, then the build process in cloud build will be triggered automatically, then a run will be created in kubeflow pipeline. You can view your pipeline and the run in kubeflow pipelines.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This sample use cloud build to implement the continuous integration process of a simple pipeline that outputs "hello world" to the console. Once all set up, you can push your code to github repo, then the build process in cloud build will be triggered automatically, then a run will be created in kubeflow pipeline. You can view your pipeline and the run in kubeflow pipelines.
This sample uses cloud build to implement the continuous integration process of a simple pipeline that outputs "hello world" to the console. Once all set up, you can push your code to github repo, then the build process in cloud build will be triggered automatically, then a run will be created in kubeflow pipeline. You can view your pipeline and the run in kubeflow pipelines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks


Besides, we use REST API to call kubeflow pipeline to create a new version and a run. Other methods to create pipeline version can be found in mnist sample in this repo, i.e., use kfp SDK.

## Usage

To use this pipeline, you need to:

* Set up a trigger in cloud build that connects to your github repo.
* Replace the constants to your own configuration in cloudbuild.yaml
* Replace images in the pipeline.py to your own images (the ones you built in cloudbuild.yaml)
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
steps:
- name: "gcr.io/cloud-builders/docker"
args:
[
"build",
"-t",
"${_GCR_PATH}/helloworld-ci:$COMMIT_SHA",
"-t",
"${_GCR_PATH}/helloworld-ci:latest",
"--cache-from",
"${_GCR_PATH}/helloworld-ci:latest",
"${_CODE_PATH}/helloworld",
]
id: "BuildImages"
- name: "python:3.7-slim"
entrypoint: "/bin/sh"
args: [
"-c",
"cd ${_CODE_PATH};
pip3 install cffi==1.12.3 --upgrade;
pip3 install kfp;
python pipeline.py --commit_id $COMMIT_SHA;
cp pipeline.py.zip /workspace/pipeline.zip",
]
id: "PackagePipeline"

- name: "gcr.io/cloud-builders/gsutil"
args:
[
"cp",
"/workspace/pipeline.zip",
"${_GS_BUCKET}/$COMMIT_SHA/pipeline.zip",
]
id: "UploadPipeline"
waitFor: ["PackagePipeline"]


- name: "gcr.io/cloud-builders/curl"
entrypoint: "/bin/sh"
args:
[
"-c",
"curl.bash $COMMIT_SHA ${_PIPELINE_ID} ${_GS_BUCKET} ${_PIPELINE_ENDPOINT} ${_GCR_PATH}"
]
id: "CreatePipelineVersionAndRun"

images:
- "${_GCR_PATH}/helloworld-ci:$COMMIT_SHA"
- "${_GCR_PATH}/helloworld-ci:latest"

substitutions:
_GCR_PATH: [Your cloud registry path. For example, gcr.io/myproject]
_CODE_PATH: /workspace/hello-world
_NAMESPACE: kubeflow
_PIPELINE_ID: [Your kubeflow pipeline id to create a version on. Get it from kfp UI.]
_GS_BUCKET: [Name of your cloud storage bucket. For example, 'gs://my-bucket']
_PIPELINE_ENDPOINT: [Your exposed pipeline endpoint.]
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#!/bin/bash

bucket_name=$(echo $3 | sed 's/gs:\/\///')
data='{"name":'\""ci-$1"\"', "code_source_url": "https://github.com/kubeflow/pipelines/tree/'"$1"'", "package_url": {"pipeline_url": "https://storage.googleapis.com/'"$bucket_name"'/'"$1"'/pipeline.zip"},
"resource_references": [{"key": {"id": '\""$2"\"', "type":3}, "relationship":1}]}'

version=$(curl -H "Content-Type: application/json" -X POST -d "$data" "$4"/apis/v1beta1/pipeline_versions | jq -r ".id")

# create run
rundata='{"name":'\""$1-run"\"',
"resource_references": [{"key": {"id": '\""$version"\"', "type":4}, "relationship":2}],
"pipeline_spec":{"parameters": [{"name": "gcr_address", "value": '\""$5"\"'}]}'
echo "$rundata"
curl -H "Content-Type: application/json" -X POST -d "$rundata" "$4"/apis/v1beta1/runs
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
FROM python:3

COPY helloworld.py .

CMD ["python", "./helloworld.py"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/usr/bin/python


def main():
print("goodbye world!!")


if __name__ == "__main__":
main()
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#!/usr/bin/env python3
# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


import kfp.dsl as dsl
from kfp.gcp import use_gcp_secret
import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--commit_id', help='Commit Id', type=str)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe elaborate more in the help field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

args = parser.parse_args()


@dsl.pipeline(
name='Mnist Sample',
description='Normal sample to demonstrate how to use CI with KFP'
)
def helloworld_ci_pipeline(
gcr_address: str
):
import os
train = dsl.ContainerOp(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try using an actual component instead of ad-hoc ContainerOp:

component.yaml:

name: Train on MNIST
implementation:
  container:
    image: helloworld-ci

and then

train_op = kfp.components.load_component_from_file('component.yaml')
...
train = train_op()

in cloudbuild.yaml we can replace the image name in the component.yaml with the newly-build image:

sed "s|image: helloworld-ci|image: ${_GCR_PATH}/helloworld-ci:$COMMIT_SHA|"

name='mnist train',
image = os.path.join(gcr_address, 'mnist_train:', args.commit_id)
).apply(use_gcp_secret('user-gcp-sa'))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is user-gcp-sa secret still valid under workload identity based deployment? @Bobgy
I remember the usage of user-gcp-sa in samples has been cleaned up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it won't be used.
Recommend removing its usage and put a link to https://www.kubeflow.org/docs/gke/authentication-pipelines/ nearby about how to authenticate to GCP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it also mean that we don't need to create a 'user-gcp-sa' secret volumn mounted on the cluster anymore?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we won't need to. Use workload identity if we care about security, and use full scope cluster for convenience.



if __name__ == '__main__':
import kfp.compiler as compiler
compiler.Compiler().compile(helloworld_ci_pipeline, __file__ + '.zip')
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Jenkins CI Sample

## Overview
This sample use Jenkins to implement continuous integration for a simple pipeline printing "hello world" to console.
This sample use curl to interact with kubeflow pipeline. An Alternative to use sdk to can be found in the mnist sample.

## Usage
To use this sample, you need to:
* Deploy kubeflow pipeline on GCP.
* Expose ml-pipeline in your workloads after deploying kubeflow pipeline.
* Create your gs bucket, and set it public
* Replace the constants in jenkinsfile to your own configuration according to the instructions in jenkinsfile.
* Deploy Jenkins on your machine or cloud
* Set up a Jenkins pipeline with the jenkins file in the folder.
* Connect Jenkins to your github repo
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
FROM python:3

COPY helloworld.py .

CMD ["python", "./helloworld.py"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/usr/bin/python


def main():
print("hello world!")


if __name__ == "__main__":
main()
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
node {
def pipeline_id="[Your pipeline id]"
def pipeline_endpoint = "[Your pipeline endpoint]"
def gs_bucket = "[Your gs bucket name, for example, gs://my-bucket]"

// build hello-world image
stage("BuildImages"){
// step1: specify source
checkout scm
def imagename = "helloworld-ci:${env.BUILD_ID}"
// build image
def image = docker.build(imagename, "./helloworld")
}

// package pipeline
stage("PackagePipeline"){
withPythonEnv('python3'){
sh '''
pip3 install cffi==1.12.3 --upgrade;
pip3 install kfp;
python pipeline.py --commit_id $scm.GIT_COMMIT;
'''
}
}

// upload pipeline to some storage so that it can be accessed by kubeflow pipeline CreatePipelineVersion API
// in this example, we set up a local http server to expose jenkins workspace to kfp API
stage("UploadPipeline"){
//copy pipeline.py.zip to a storage without access control
sh"""
gsutil cp ./pipeline.py.zip $gs_bucket/$scm.GIT_COMMIT/pipeline.zip
"""
}

// create pipeline version and a new run
stage("CreatePipelineVersionAndRun"){
def version_name = "jenkins-ci-$scm.GIT_COMMIT"
def run_name = "$scm.GIT_COMMIT-run"


data = sh(script: """echo '{"name": "$version_name", "package_url": {"pipeline_url": "https://storage.googleapis.com/test-pipeline-version/$scm.GIT_COMMIT/pipeline.zip"}, "resource_references": [{"key": {"id": "$pipeline_id", "type":3}, "relationship":1}]}';""", returnStdout: true).trim()
//echo "data is: $data"
version=sh(script: """curl -H "Content-Type: application/json" -X POST -d '$data' "$pipeline_endpoint"/apis/v1beta1/pipeline_versions | jq -r ".id";""", returnStdout: true).trim()
rundata=sh(script: """echo '{"name": "$run_name", "resource_references": [{"key": {"id": "$version", "type":4}, "relationship":2}]}';""", returnStdout: true).trim()
//echo "run data is: $rundata"
sh(script: """curl -H "Content-Type: application/json" -X POST -d '$rundata' "$pipeline_endpoint"/apis/v1beta1/runs""")

}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Kaggle Competition Pipeline Sample

## Pipeline Overview

This is a pipeline for [house price prediction](https://www.kaggle.com/c/house-prices-advanced-regression-techniques), an entry-level competition in kaggle. We demonstrate how to complete a kaggle competition by creating a pipeline of steps including downloading data, preprocessing and visualize data, train model and submitting results to kaggle website.

* We refer to [this notebook](https://www.kaggle.com/rajgupta5/house-price-prediction) and [this notebook](https://www.kaggle.com/neviadomski/how-to-get-to-top-25-with-simple-model-sklearn) in terms of model implementation as well as data visualization.

* We use [kaggle python api](https://github.com/Kaggle/kaggle-api) to interact with kaggle site, such as downloading data and submiting result. More usage can be found in their documentation.

* We use [cloud build](https://cloud.google.com/cloud-build/) for CI process. That is, we automatically triggered a build and run as soon as we pushed our code to github repo. You need to setup a trigger on cloud build for your github repo branch to achieve the CI process.

## Usage

* Replace the substitutions in cloudbuild.yaml
* Fill in your kaggle_username and kaggle_key in Dockerfiles under these folders: download_dataset, submit_result, to authenticate to kaggle. You can get them from an API token created from your kaggle account page. To be specific: create an API token, find the username and key in the json file.
* Set up cloud build triggers for Continuous Integration.
* Change the images in pipeline.py to the ones you built in cloudbuild.yaml
* Expose your bucket public
Loading