From e392f5e901d2028e80205c45d312c4dc71ceea8d Mon Sep 17 00:00:00 2001
From: Rostam Dinyari <rostam@google.com>
Date: Tue, 12 Feb 2019 22:09:34 -0800
Subject: [PATCH 1/3] Kubeflow pipelines quickstart notebooks added.

---
 samples/notebooks/quickstart.ipynb    | 572 ++++++++++++++++++++++++++
 samples/notebooks/quickstart_iris.csv | 150 +++++++
 2 files changed, 722 insertions(+)
 create mode 100644 samples/notebooks/quickstart.ipynb
 create mode 100644 samples/notebooks/quickstart_iris.csv

diff --git a/samples/notebooks/quickstart.ipynb b/samples/notebooks/quickstart.ipynb
new file mode 100644
index 00000000000..0ae637efd21
--- /dev/null
+++ b/samples/notebooks/quickstart.ipynb
@@ -0,0 +1,572 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Copyright 2019 Google Inc. All Rights Reserved.\n",
+    "#\n",
+    "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+    "# you may not use this file except in compliance with the License.\n",
+    "# You may obtain a copy of the License at\n",
+    "#\n",
+    "#     http://www.apache.org/licenses/LICENSE-2.0\n",
+    "#\n",
+    "# Unless required by applicable law or agreed to in writing, software\n",
+    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+    "# See the License for the specific language governing permissions and\n",
+    "# limitations under the License."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Part 1\n",
+    "# Multiple ways to author a component to list blobs in a GCS bucket\n",
+    "A pipeline is composed of one or more components. In this section, you will build a single component that that lists the blobs in a GCS bucket. Then you buid a pipeline that consists of this component."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Create a 'lightweight python component' from a Python function."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 1.1 Define component function\n",
+    "The requirements for the component function:\n",
+    "* The function must be stand-alone.\n",
+    "* The function can only import packages that are available in the base image.\n",
+    "* If the function operates on numbers, the parameters must have type hints. Supported types are `int`, `float`, `bool`. Everything else is passed as `str`, that is, string.\n",
+    "* To build a component with multiple output values, use Python’s `typing.NamedTuple` type hint syntax."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def list_blobs(bucket_name: str) -> str:\n",
+    "  '''Lists all the blobs in the bucket.'''\n",
+    "  import subprocess\n",
+    "\n",
+    "  subprocess.call(['pip', 'install', '--upgrade', 'google-cloud-storage'])\n",
+    "  from google.cloud import storage\n",
+    "  storage_client = storage.Client()\n",
+    "  bucket = storage_client.get_bucket(bucket_name)\n",
+    "  list_blobs_response = bucket.list_blobs()\n",
+    "  blobs = ','.join([blob.name for blob in list_blobs_response])\n",
+    "  print(blobs)\n",
+    "  return blobs"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 1.2 Create a lightweight Python component"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import kfp.components as comp\n",
+    "\n",
+    "# Converts the function to a lightweight Python component.\n",
+    "list_blobs_op = comp.func_to_container_op(list_blobs)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 1.3 Define pipeline"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import kfp.dsl as dsl\n",
+    "\n",
+    "# Defines the pipeline.\n",
+    "@dsl.pipeline(name='List GCS blobs', description='Lists GCS blobs.')\n",
+    "def pipeline_func(bucket_name=dsl.PipelineParam('bucket')):\n",
+    "  list_blobs_task = list_blobs_op(bucket_name)\n",
+    "\n",
+    "# Compile the pipeline to a file.\n",
+    "import kfp.compiler as compiler\n",
+    "compiler.Compiler().compile(pipeline_func, 'list_blobs.pipeline.tar.gz')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Build a new Docker container image from a Python function.\n",
+    "Refer [here](https://runnable.com/docker/python/dockerize-your-python-application#alternatives) for more information."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. Wrap an existing Docker container image using `ContainerOp`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 3.1 Create a Docker container\n",
+    "Create your own container image that includes your program. If your component creates some outputs to be fed as inputs to the downstream components, each separate output must be written as a string to a separate local text file by the container image. For example, if a trainer component needs to output the trained model path, it can write the path to a local file `/output.txt`. The string written to an output file cannot be too big. If it is too big (> 500kb), you can save the output to an external persistent storage and pass the storage path to the next component.\n",
+    "\n",
+    "Start by entering the value of your Google Cloud Platform Project ID."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# GCP Project ID\n",
+    "PROJECT_ID='PROJECT_ID'\n",
+    "\n",
+    "assert(PROJECT_ID is not 'PROJECT_ID')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The following cell creates a file `app.py` that contains a Python script. The script takes a GCS bucket name as an input argument, gets the lists of blobs in that bucket, prints the list of blobs and also writes them to an output file."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%bash\n",
+    "\n",
+    "# Create folders if they don't exist.\n",
+    "mkdir -p tmp/components/list-gcs-blobs\n",
+    "\n",
+    "# Create the Python file that lists GCS blobs.\n",
+    "cat > ./tmp/components/list-gcs-blobs/app.py <<HERE\n",
+    "import argparse\n",
+    "from google.cloud import storage\n",
+    "# Parse agruments.\n",
+    "parser = argparse.ArgumentParser()\n",
+    "parser.add_argument('--bucket', type=str, required=True, help='GCS bucket name.')\n",
+    "args = parser.parse_args()\n",
+    "# Create a client.\n",
+    "storage_client = storage.Client()\n",
+    "# List blobs.\n",
+    "bucket = storage_client.get_bucket(args.bucket)\n",
+    "list_blobs_response = bucket.list_blobs()\n",
+    "blobs = ','.join([blob.name for blob in list_blobs_response])\n",
+    "print(blobs)\n",
+    "with open('/blobs.txt', 'w') as f:\n",
+    "  f.write(blobs)\n",
+    "HERE"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now create a container that runs the script. Start by creating a `Dockerfile`. A `Dockerfile` contains the instructions to assemble a Docker image. The `FROM` statement specifies the Base Image from which you are building. `WORKDIR` sets the working directory. When you assemble the Docker image, `COPY` will copy the required files and directories (for example, `app.py`) to the filesystem of the container. `RUN` will execute a command (for example, install the dependencies) and commits the results. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%bash\n",
+    "\n",
+    "# Create Dockerfile.\n",
+    "cat > ./tmp/components/list-gcs-blobs/Dockerfile <<HERE\n",
+    "FROM python:3.6-slim\n",
+    "WORKDIR /app\n",
+    "COPY . /app\n",
+    "RUN pip install --upgrade google-cloud-storage\n",
+    "HERE"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now create a Shell script that builds a container image and stores it in the Google Container Registry (GCR)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%bash -s \"{PROJECT_ID}\"\n",
+    "\n",
+    "IMAGE_NAME=\"listgcsblobs\"\n",
+    "TAG=\"latest\" # \"v_$(date +%Y%m%d_%H%M%S)\"\n",
+    "\n",
+    "# Create script to build docker image and push it.\n",
+    "cat > ./tmp/components/list-gcs-blobs/build_image.sh <<HERE\n",
+    "PROJECT_ID=\"${1}\"\n",
+    "IMAGE_NAME=\"${IMAGE_NAME}\"\n",
+    "TAG=\"${TAG}\"\n",
+    "GCR_IMAGE=\"gcr.io/\\${PROJECT_ID}/\\${IMAGE_NAME}:\\${TAG}\"\n",
+    "docker build -t \\${IMAGE_NAME} .\n",
+    "docker tag \\${IMAGE_NAME} \\${GCR_IMAGE}\n",
+    "docker push \\${GCR_IMAGE}\n",
+    "docker image rm \\${IMAGE_NAME}\n",
+    "docker image rm \\${GCR_IMAGE}\n",
+    "HERE"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Run the script."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%bash\n",
+    "\n",
+    "# Build and push the image.\n",
+    "cd tmp/components/list-gcs-blobs\n",
+    "bash build_image.sh"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 3.2 Create a Python class for each component\n",
+    "Define a Python class that describes the interactions with the Docker container image created in the previous step. The Python class specifies the component name, the image to use, the command to run after the container starts, the input arguments, and the file outputs. Each component needs to inherit from `kfp.dsl.ContainerOp`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import kfp.dsl\n",
+    "\n",
+    "class ListGcsBlobsOp(kfp.dsl.ContainerOp):\n",
+    "  def __init__(self, name, bucket):\n",
+    "    super(ListGcsBlobsOp, self).__init__(\n",
+    "      name=name,\n",
+    "      image='gcr.io/{}/listgcsblobs:latest'.format(PROJECT_ID),\n",
+    "      command=['python', '/app/app.py'],\n",
+    "      file_outputs={'blobs': '/blobs.txt'},\n",
+    "      arguments=['--bucket', bucket]\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 3.3 Create your workflow as a Python function\n",
+    "Start by creating a folder to store the pipeline file."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create folders if they don't exist.\n",
+    "!mkdir -p tmp/pipelines"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Define your pipeline as a Python function. ` @kfp.dsl.pipeline` is a required decoration including `name` and `description` properties. Then compile the pipeline function. After the compilation is completed, a pipeline file is created."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import datetime\n",
+    "import kfp.compiler as compiler\n",
+    "\n",
+    "# Define the pipeline\n",
+    "@kfp.dsl.pipeline(\n",
+    "  name='List GCS Blobs',\n",
+    "  description='Takes a GCS bucket name as input and lists the blobs.'\n",
+    ")\n",
+    "def pipeline_func(bucket=kfp.dsl.PipelineParam('bucket', value='Enter your bucket name here.')):\n",
+    "  list_blobs_task = ListGcsBlobsOp('List', bucket)\n",
+    "\n",
+    "# Compile the pipeline to a file.\n",
+    "filename = 'tmp/pipelines/list_blobs{dt:%Y%m%d_%H%M%S}.pipeline.tar.gz'.format(dt=datetime.datetime.now())\n",
+    "compiler.Compiler().compile(pipeline_func, filename)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Go to your Cloud Shell, run `conn.sh`, click on the provided link. In the new tab that opens, click on \"Pipelines Dashboard\" and go to Kubeflow pipelines UI. Upload the created pipeline and run it.\n",
+    "\n",
+    "**Warning:** When the pipeline is run, it pulls the image from the repository to the Kubernetes cluster to create a container. Kubernetes caches pulled images. One solution is to use the image digest instead of the tag in your component dsl, for example, `s/v1/sha256:9509182e27dcba6d6903fccf444dc6188709cc094a018d5dd4211573597485c9/g`. Alternatively, if you don't want to update the digest every time, you can try `:latest` tag, which will force the k8s to always pull the latest image.."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "___"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Part 2\n",
+    "# Create a pipeline using Kubeflow Pipelines"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this section, you will build another component. Then you will see how to connect components to build a multi-component pipeline. You will build the new component by building a Docker container image and wrapping it using `ContainerOp`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1 Create a container to view CSV\n",
+    "Build a component that can the output of the first component explained in the preceding section (that is, the list of GCS blobs), selects a file ending in `iris.csv` and displays its content as an artifact. Start by uploading to your Storage bucket the `quickstart_iris.csv` file that is included in the repository."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%bash -s \"{PROJECT_ID}\"\n",
+    "# Create folders if they don't exist.\n",
+    "mkdir -p tmp/components/view-input\n",
+    "\n",
+    "\n",
+    "# Create the Python file that selects and views the input CSV.\n",
+    "cat > ./tmp/components/view-input/app.py <<HERE\n",
+    "import argparse\n",
+    "import json\n",
+    "from google.cloud import storage\n",
+    "# Parse agruments.\n",
+    "parser = argparse.ArgumentParser()\n",
+    "parser.add_argument('--blobs', type=str, required=True, help='List of blobs.')\n",
+    "args = parser.parse_args()\n",
+    "blobs = args.blobs.split(',')\n",
+    "inputs = filter(lambda s: s.endswith('iris.csv'), blobs)\n",
+    "input = list(inputs)[0]\n",
+    "print('The CSV file is {}'.format(input))\n",
+    "# CSV header.\n",
+    "header = [\n",
+    "    'sepal_length',\n",
+    "    'sepal_width',\n",
+    "    'petal_length',\n",
+    "    'petal_width',\n",
+    "    'species',\n",
+    "]\n",
+    "# Add a metadata for an artifact.\n",
+    "metadata = {\n",
+    "  'outputs' : [{\n",
+    "    'type': 'table',\n",
+    "    'storage': 'gcs',\n",
+    "    'format': 'csv',\n",
+    "    'header': header,\n",
+    "    'source': input\n",
+    "  }]\n",
+    "}\n",
+    "print(metadata)\n",
+    "# Create an artifact.\n",
+    "with open('/mlpipeline-ui-metadata.json', 'w') as f:\n",
+    "  json.dump(metadata, f)\n",
+    "HERE\n",
+    "\n",
+    "\n",
+    "# Create Dockerfile.\n",
+    "cat > ./tmp/components/view-input/Dockerfile <<HERE\n",
+    "FROM python:3.6-slim\n",
+    "WORKDIR /app\n",
+    "COPY . /app\n",
+    "RUN pip install --upgrade google-cloud-storage\n",
+    "HERE\n",
+    "\n",
+    "\n",
+    "# Create script to build docker image and push it.\n",
+    "IMAGE_NAME=\"viewinput\"\n",
+    "TAG=\"latest\" # \"v_$(date +%Y%m%d_%H%M%S)\"\n",
+    "cat > ./tmp/components/view-input/build_image.sh <<HERE\n",
+    "PROJECT_ID=\"${1}\"\n",
+    "IMAGE_NAME=\"${IMAGE_NAME}\"\n",
+    "TAG=\"${TAG}\"\n",
+    "GCR_IMAGE=\"gcr.io/\\${PROJECT_ID}/\\${IMAGE_NAME}:\\${TAG}\"\n",
+    "docker build -t \\${IMAGE_NAME} .\n",
+    "docker tag \\${IMAGE_NAME} \\${GCR_IMAGE}\n",
+    "docker push \\${GCR_IMAGE}\n",
+    "docker image rm \\${IMAGE_NAME}\n",
+    "docker image rm \\${GCR_IMAGE}\n",
+    "HERE\n",
+    "\n",
+    "\n",
+    "# Build and push the image.\n",
+    "cd tmp/components/view-input\n",
+    "bash build_image.sh"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2 Create a Python class for each component\n",
+    "The Python classes describe the interactions with the Docker container image created in step one. Each component needs to inherit from `kfp.dsl.ContainerOp`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import kfp.dsl\n",
+    "\n",
+    "class ListGcsBlobsOp(kfp.dsl.ContainerOp):\n",
+    "  def __init__(self, name, bucket):\n",
+    "    super(ListGcsBlobsOp, self).__init__(\n",
+    "      name=name,\n",
+    "      image='gcr.io/{}/listgcsblobs:latest'.format(PROJECT_ID),\n",
+    "      command=['python', '/app/app.py'],\n",
+    "      file_outputs={'blobs': '/blobs.txt'},\n",
+    "      arguments=['--bucket', bucket]\n",
+    "    )\n",
+    "\n",
+    "class ViewInputOp(kfp.dsl.ContainerOp):\n",
+    "  def __init__(self, name, blobs):\n",
+    "    super(ViewInputOp, self).__init__(\n",
+    "      name=name,\n",
+    "      image='gcr.io/{}/viewinput:latest'.format(PROJECT_ID),\n",
+    "      command=['python', '/app/app.py'],\n",
+    "      arguments=['--blobs', blobs]\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3 Create your workflow as a Python function\n",
+    "Define your pipeline as a Python function. ` @kfp.dsl.pipeline` is a required decoration including `name` and `description` properties. `pipeline_func` defines the pipeline. `bucket=kfp.dsl.PipelineParam(...)` specifies that the pipeline takes an input parameter `bucket`. Later when you load the pipeline, `kfp.dsl.PipelineParam('bucket', value='Enter your bucket name here.')` will create an input box in the UI with the initial value `Enter your bucket name here.`. You can change the initial value with your bucket name at runtime. `ListGcsBlobsOp('List', bucket)` will create a component named `List` that lists the blobs. `ViewInputOp('View', list_blobs_task.outputs['blobs'])` will create a component named `View` that views a CSV. `list_blobs_task.outputs['blobs']` tells the pipeline to take the output of the first component stored as string in `blobs.txt` as an input for the second component."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create folders if they don't exist.\n",
+    "!mkdir -p tmp/pipelines"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import datetime\n",
+    "import kfp.compiler as compiler\n",
+    "\n",
+    "# Define the pipeline\n",
+    "@kfp.dsl.pipeline(\n",
+    "  name='Quickstart pipeline',\n",
+    "  description='Takes a GCS bucket name views a CSV input file in the bucket.'\n",
+    ")\n",
+    "def pipeline_func(bucket=kfp.dsl.PipelineParam('bucket', value='Enter your bucket name here.')):\n",
+    "  list_blobs_task = ListGcsBlobsOp('List', bucket)\n",
+    "  view_input_task = ViewInputOp('View', list_blobs_task.outputs['blobs'])\n",
+    "\n",
+    "# Compile the pipeline to a file.\n",
+    "filename = 'tmp/pipelines/quickstart_pipeline{dt:%Y%m%d_%H%M%S}.pipeline.tar.gz'.format(dt=datetime.datetime.now())\n",
+    "compiler.Compiler().compile(pipeline_func, filename)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Go to your Cloud Shell and run `conn.sh`, and click on the provided link. In the new tab that opens, click on \"Pipelines Dashboard\". Upload the created pipeline and run it."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/samples/notebooks/quickstart_iris.csv b/samples/notebooks/quickstart_iris.csv
new file mode 100644
index 00000000000..1a754867d9b
--- /dev/null
+++ b/samples/notebooks/quickstart_iris.csv
@@ -0,0 +1,150 @@
+5.1,3.5,1.4,0.2,setosa
+4.9,3.0,1.4,0.2,setosa
+4.7,3.2,1.3,0.2,setosa
+4.6,3.1,1.5,0.2,setosa
+5.0,3.6,1.4,0.2,setosa
+5.4,3.9,1.7,0.4,setosa
+4.6,3.4,1.4,0.3,setosa
+5.0,3.4,1.5,0.2,setosa
+4.4,2.9,1.4,0.2,setosa
+4.9,3.1,1.5,0.1,setosa
+5.4,3.7,1.5,0.2,setosa
+4.8,3.4,1.6,0.2,setosa
+4.8,3.0,1.4,0.1,setosa
+4.3,3.0,1.1,0.1,setosa
+5.8,4.0,1.2,0.2,setosa
+5.7,4.4,1.5,0.4,setosa
+5.4,3.9,1.3,0.4,setosa
+5.1,3.5,1.4,0.3,setosa
+5.7,3.8,1.7,0.3,setosa
+5.1,3.8,1.5,0.3,setosa
+5.4,3.4,1.7,0.2,setosa
+5.1,3.7,1.5,0.4,setosa
+4.6,3.6,1.0,0.2,setosa
+5.1,3.3,1.7,0.5,setosa
+4.8,3.4,1.9,0.2,setosa
+5.0,3.0,1.6,0.2,setosa
+5.0,3.4,1.6,0.4,setosa
+5.2,3.5,1.5,0.2,setosa
+5.2,3.4,1.4,0.2,setosa
+4.7,3.2,1.6,0.2,setosa
+4.8,3.1,1.6,0.2,setosa
+5.4,3.4,1.5,0.4,setosa
+5.2,4.1,1.5,0.1,setosa
+5.5,4.2,1.4,0.2,setosa
+4.9,3.1,1.5,0.1,setosa
+5.0,3.2,1.2,0.2,setosa
+5.5,3.5,1.3,0.2,setosa
+4.9,3.1,1.5,0.1,setosa
+4.4,3.0,1.3,0.2,setosa
+5.1,3.4,1.5,0.2,setosa
+5.0,3.5,1.3,0.3,setosa
+4.5,2.3,1.3,0.3,setosa
+4.4,3.2,1.3,0.2,setosa
+5.0,3.5,1.6,0.6,setosa
+5.1,3.8,1.9,0.4,setosa
+4.8,3.0,1.4,0.3,setosa
+5.1,3.8,1.6,0.2,setosa
+4.6,3.2,1.4,0.2,setosa
+5.3,3.7,1.5,0.2,setosa
+5.0,3.3,1.4,0.2,setosa
+7.0,3.2,4.7,1.4,versicolor
+6.4,3.2,4.5,1.5,versicolor
+6.9,3.1,4.9,1.5,versicolor
+5.5,2.3,4.0,1.3,versicolor
+6.5,2.8,4.6,1.5,versicolor
+5.7,2.8,4.5,1.3,versicolor
+6.3,3.3,4.7,1.6,versicolor
+4.9,2.4,3.3,1.0,versicolor
+6.6,2.9,4.6,1.3,versicolor
+5.2,2.7,3.9,1.4,versicolor
+5.0,2.0,3.5,1.0,versicolor
+5.9,3.0,4.2,1.5,versicolor
+6.0,2.2,4.0,1.0,versicolor
+6.1,2.9,4.7,1.4,versicolor
+5.6,2.9,3.6,1.3,versicolor
+6.7,3.1,4.4,1.4,versicolor
+5.6,3.0,4.5,1.5,versicolor
+5.8,2.7,4.1,1.0,versicolor
+6.2,2.2,4.5,1.5,versicolor
+5.6,2.5,3.9,1.1,versicolor
+5.9,3.2,4.8,1.8,versicolor
+6.1,2.8,4.0,1.3,versicolor
+6.3,2.5,4.9,1.5,versicolor
+6.1,2.8,4.7,1.2,versicolor
+6.4,2.9,4.3,1.3,versicolor
+6.6,3.0,4.4,1.4,versicolor
+6.8,2.8,4.8,1.4,versicolor
+6.7,3.0,5.0,1.7,versicolor
+6.0,2.9,4.5,1.5,versicolor
+5.7,2.6,3.5,1.0,versicolor
+5.5,2.4,3.8,1.1,versicolor
+5.5,2.4,3.7,1.0,versicolor
+5.8,2.7,3.9,1.2,versicolor
+6.0,2.7,5.1,1.6,versicolor
+5.4,3.0,4.5,1.5,versicolor
+6.0,3.4,4.5,1.6,versicolor
+6.7,3.1,4.7,1.5,versicolor
+6.3,2.3,4.4,1.3,versicolor
+5.6,3.0,4.1,1.3,versicolor
+5.5,2.5,4.0,1.3,versicolor
+5.5,2.6,4.4,1.2,versicolor
+6.1,3.0,4.6,1.4,versicolor
+5.8,2.6,4.0,1.2,versicolor
+5.0,2.3,3.3,1.0,versicolor
+5.6,2.7,4.2,1.3,versicolor
+5.7,3.0,4.2,1.2,versicolor
+5.7,2.9,4.2,1.3,versicolor
+6.2,2.9,4.3,1.3,versicolor
+5.1,2.5,3.0,1.1,versicolor
+5.7,2.8,4.1,1.3,versicolor
+6.3,3.3,6.0,2.5,virginica
+5.8,2.7,5.1,1.9,virginica
+7.1,3.0,5.9,2.1,virginica
+6.3,2.9,5.6,1.8,virginica
+6.5,3.0,5.8,2.2,virginica
+7.6,3.0,6.6,2.1,virginica
+4.9,2.5,4.5,1.7,virginica
+7.3,2.9,6.3,1.8,virginica
+6.7,2.5,5.8,1.8,virginica
+7.2,3.6,6.1,2.5,virginica
+6.5,3.2,5.1,2.0,virginica
+6.4,2.7,5.3,1.9,virginica
+6.8,3.0,5.5,2.1,virginica
+5.7,2.5,5.0,2.0,virginica
+5.8,2.8,5.1,2.4,virginica
+6.4,3.2,5.3,2.3,virginica
+6.5,3.0,5.5,1.8,virginica
+7.7,3.8,6.7,2.2,virginica
+7.7,2.6,6.9,2.3,virginica
+6.0,2.2,5.0,1.5,virginica
+6.9,3.2,5.7,2.3,virginica
+5.6,2.8,4.9,2.0,virginica
+7.7,2.8,6.7,2.0,virginica
+6.3,2.7,4.9,1.8,virginica
+6.7,3.3,5.7,2.1,virginica
+7.2,3.2,6.0,1.8,virginica
+6.2,2.8,4.8,1.8,virginica
+6.1,3.0,4.9,1.8,virginica
+6.4,2.8,5.6,2.1,virginica
+7.2,3.0,5.8,1.6,virginica
+7.4,2.8,6.1,1.9,virginica
+7.9,3.8,6.4,2.0,virginica
+6.4,2.8,5.6,2.2,virginica
+6.3,2.8,5.1,1.5,virginica
+6.1,2.6,5.6,1.4,virginica
+7.7,3.0,6.1,2.3,virginica
+6.3,3.4,5.6,2.4,virginica
+6.4,3.1,5.5,1.8,virginica
+6.0,3.0,4.8,1.8,virginica
+6.9,3.1,5.4,2.1,virginica
+6.7,3.1,5.6,2.4,virginica
+6.9,3.1,5.1,2.3,virginica
+5.8,2.7,5.1,1.9,virginica
+6.8,3.2,5.9,2.3,virginica
+6.7,3.3,5.7,2.5,virginica
+6.7,3.0,5.2,2.3,virginica
+6.3,2.5,5.0,1.9,virginica
+6.5,3.0,5.2,2.0,virginica
+6.2,3.4,5.4,2.3,virginica
+5.9,3.0,5.1,1.8,virginica

From 6e9b2c8be6d5c731ad98da623b6ca1554b229155 Mon Sep 17 00:00:00 2001
From: Rostam Dinyari <rostam@google.com>
Date: Sun, 24 Mar 2019 00:46:56 -0700
Subject: [PATCH 2/3] Incorporated comments.

---
 samples/notebooks/quickstart.ipynb | 118 ++++++++++++++---------------
 1 file changed, 56 insertions(+), 62 deletions(-)

diff --git a/samples/notebooks/quickstart.ipynb b/samples/notebooks/quickstart.ipynb
index 0ae637efd21..27f1359a88c 100644
--- a/samples/notebooks/quickstart.ipynb
+++ b/samples/notebooks/quickstart.ipynb
@@ -26,15 +26,15 @@
    "metadata": {},
    "source": [
     "# Part 1\n",
-    "# Multiple ways to author a component to list blobs in a GCS bucket\n",
-    "A pipeline is composed of one or more components. In this section, you will build a single component that that lists the blobs in a GCS bucket. Then you buid a pipeline that consists of this component."
+    "# Two ways to author a component to list blobs in a GCS bucket\n",
+    "A pipeline is composed of one or more components. In this section, you will build a single component that that lists the blobs in a GCS bucket. Then you buid a pipeline that consists of this component. There are two ways to author a component. In the following sections we will go through each of them."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 1. Create a 'lightweight python component' from a Python function."
+    "## 1. Create a lightweight python component from a Python function."
    ]
   },
   {
@@ -117,39 +117,19 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 2. Build a new Docker container image from a Python function.\n",
-    "Refer [here](https://runnable.com/docker/python/dockerize-your-python-application#alternatives) for more information."
+    "## 2. Wrap an existing Docker container image using `ContainerOp`"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 3. Wrap an existing Docker container image using `ContainerOp`"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### 3.1 Create a Docker container\n",
-    "Create your own container image that includes your program. If your component creates some outputs to be fed as inputs to the downstream components, each separate output must be written as a string to a separate local text file by the container image. For example, if a trainer component needs to output the trained model path, it can write the path to a local file `/output.txt`. The string written to an output file cannot be too big. If it is too big (> 500kb), you can save the output to an external persistent storage and pass the storage path to the next component.\n",
+    "### 2.1 Create a Docker container\n",
+    "Create your own container image that includes your program. If your component creates some outputs to be fed as inputs to the downstream components, each separate output must be written as a string to a separate local text file by the container image. For example, if a trainer component needs to output the trained model path, it can write the path to a local file `/output.txt`. The string written to an output file cannot be too big. If it is too big (>> 100 kB), save the output to an external persistent storage and pass the storage path to the next component.\n",
     "\n",
     "Start by entering the value of your Google Cloud Platform Project ID."
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# GCP Project ID\n",
-    "PROJECT_ID='PROJECT_ID'\n",
-    "\n",
-    "assert(PROJECT_ID is not 'PROJECT_ID')"
-   ]
-  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -174,7 +154,8 @@
     "from google.cloud import storage\n",
     "# Parse agruments.\n",
     "parser = argparse.ArgumentParser()\n",
-    "parser.add_argument('--bucket', type=str, required=True, help='GCS bucket name.')\n",
+    "parser.add_argument(\n",
+    "    '--bucket', type=str, required=True, help='GCS bucket name.')\n",
     "args = parser.parse_args()\n",
     "# Create a client.\n",
     "storage_client = storage.Client()\n",
@@ -216,7 +197,26 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Now create a Shell script that builds a container image and stores it in the Google Container Registry (GCR)."
+    "Now that we have created our Dockerfile we can create our Docker image. Then we need to push the image to a registry to host the image. Here, we will use Google Container Registry, but any other accessible registry works as well. In the following cell set your project ID that will be used to to push your image to Google Container Registry."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# GCP Project ID\n",
+    "PROJECT_ID='PROJECT_ID'\n",
+    "\n",
+    "assert(PROJECT_ID is not 'PROJECT_ID')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now create a Shell script that builds a container image and stores it in the Google Container Registry."
    ]
   },
   {
@@ -268,8 +268,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 3.2 Create a Python class for each component\n",
-    "Define a Python class that describes the interactions with the Docker container image created in the previous step. The Python class specifies the component name, the image to use, the command to run after the container starts, the input arguments, and the file outputs. Each component needs to inherit from `kfp.dsl.ContainerOp`."
+    "### 2.2 Define each component\n",
+    "Define a component by creating an instance of `kfp.dsl.ContainerOp` that describes the interactions with the Docker container image created in the previous step. You need to specify the component name, the image to use, the command to run after the container starts, the input arguments, and the file outputs. ."
    ]
   },
   {
@@ -280,22 +280,21 @@
    "source": [
     "import kfp.dsl\n",
     "\n",
-    "class ListGcsBlobsOp(kfp.dsl.ContainerOp):\n",
-    "  def __init__(self, name, bucket):\n",
-    "    super(ListGcsBlobsOp, self).__init__(\n",
+    "def list_gcs_blobs_op(name, bucket):\n",
+    "  return kfp.dsl.ContainerOp(\n",
     "      name=name,\n",
     "      image='gcr.io/{}/listgcsblobs:latest'.format(PROJECT_ID),\n",
     "      command=['python', '/app/app.py'],\n",
     "      file_outputs={'blobs': '/blobs.txt'},\n",
     "      arguments=['--bucket', bucket]\n",
-    "    )"
+    "  )"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 3.3 Create your workflow as a Python function\n",
+    "### 2.3 Create your workflow as a Python function\n",
     "Start by creating a folder to store the pipeline file."
    ]
   },
@@ -330,11 +329,13 @@
     "  name='List GCS Blobs',\n",
     "  description='Takes a GCS bucket name as input and lists the blobs.'\n",
     ")\n",
-    "def pipeline_func(bucket=kfp.dsl.PipelineParam('bucket', value='Enter your bucket name here.')):\n",
-    "  list_blobs_task = ListGcsBlobsOp('List', bucket)\n",
+    "def pipeline_func(\n",
+    "    bucket=kfp.dsl.PipelineParam('bucket', value='Enter your bucket name here.')):\n",
+    "  list_blobs_task = list_gcs_blobs_op('List', bucket)\n",
     "\n",
     "# Compile the pipeline to a file.\n",
-    "filename = 'tmp/pipelines/list_blobs{dt:%Y%m%d_%H%M%S}.pipeline.tar.gz'.format(dt=datetime.datetime.now())\n",
+    "filename = 'tmp/pipelines/list_blobs{dt:%Y%m%d_%H%M%S}.pipeline.tar.gz'.format(\n",
+    "    dt=datetime.datetime.now())\n",
     "compiler.Compiler().compile(pipeline_func, filename)"
    ]
   },
@@ -342,7 +343,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Go to your Cloud Shell, run `conn.sh`, click on the provided link. In the new tab that opens, click on \"Pipelines Dashboard\" and go to Kubeflow pipelines UI. Upload the created pipeline and run it.\n",
+    "Follow the [instructions](https://www.kubeflow.org/docs/other-guides/accessing-uis/) on kubeflow.org to access Kubeflow UIs. Upload the created pipeline and run it.\n",
     "\n",
     "**Warning:** When the pipeline is run, it pulls the image from the repository to the Kubernetes cluster to create a container. Kubernetes caches pulled images. One solution is to use the image digest instead of the tag in your component dsl, for example, `s/v1/sha256:9509182e27dcba6d6903fccf444dc6188709cc094a018d5dd4211573597485c9/g`. Alternatively, if you don't want to update the digest every time, you can try `:latest` tag, which will force the k8s to always pull the latest image.."
    ]
@@ -460,8 +461,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 2 Create a Python class for each component\n",
-    "The Python classes describe the interactions with the Docker container image created in step one. Each component needs to inherit from `kfp.dsl.ContainerOp`."
+    "## 2 Define each component\n",
+    "Define each of your components by using `kfp.dsl.ContainerOp`. Decribe the interactions with the Docker container image created in the previous step by specifying the component name, the image to use, the command to run after the container starts, the input arguments, and the file outputs."
    ]
   },
   {
@@ -472,24 +473,22 @@
    "source": [
     "import kfp.dsl\n",
     "\n",
-    "class ListGcsBlobsOp(kfp.dsl.ContainerOp):\n",
-    "  def __init__(self, name, bucket):\n",
-    "    super(ListGcsBlobsOp, self).__init__(\n",
+    "def list_gcs_blobs_op(name, bucket):\n",
+    "  return kfp.dsl.ContainerOp(\n",
     "      name=name,\n",
     "      image='gcr.io/{}/listgcsblobs:latest'.format(PROJECT_ID),\n",
     "      command=['python', '/app/app.py'],\n",
     "      file_outputs={'blobs': '/blobs.txt'},\n",
     "      arguments=['--bucket', bucket]\n",
-    "    )\n",
+    "  )\n",
     "\n",
-    "class ViewInputOp(kfp.dsl.ContainerOp):\n",
-    "  def __init__(self, name, blobs):\n",
-    "    super(ViewInputOp, self).__init__(\n",
+    "def view_input_op(name, blobs):\n",
+    "  return kfp.dsl.ContainerOp(\n",
     "      name=name,\n",
     "      image='gcr.io/{}/viewinput:latest'.format(PROJECT_ID),\n",
     "      command=['python', '/app/app.py'],\n",
     "      arguments=['--blobs', blobs]\n",
-    "    )"
+    "  )"
    ]
   },
   {
@@ -497,7 +496,7 @@
    "metadata": {},
    "source": [
     "## 3 Create your workflow as a Python function\n",
-    "Define your pipeline as a Python function. ` @kfp.dsl.pipeline` is a required decoration including `name` and `description` properties. `pipeline_func` defines the pipeline. `bucket=kfp.dsl.PipelineParam(...)` specifies that the pipeline takes an input parameter `bucket`. Later when you load the pipeline, `kfp.dsl.PipelineParam('bucket', value='Enter your bucket name here.')` will create an input box in the UI with the initial value `Enter your bucket name here.`. You can change the initial value with your bucket name at runtime. `ListGcsBlobsOp('List', bucket)` will create a component named `List` that lists the blobs. `ViewInputOp('View', list_blobs_task.outputs['blobs'])` will create a component named `View` that views a CSV. `list_blobs_task.outputs['blobs']` tells the pipeline to take the output of the first component stored as string in `blobs.txt` as an input for the second component."
+    "Define your pipeline as a Python function. ` @kfp.dsl.pipeline` is a required decoration including `name` and `description` properties. `pipeline_func` defines the pipeline. `bucket=kfp.dsl.PipelineParam(...)` specifies that the pipeline takes an input parameter `bucket`. Later when you load the pipeline, `kfp.dsl.PipelineParam('bucket', value='Enter your bucket name here.')` will create an input box in the UI with the initial value `Enter your bucket name here.`. You can change the initial value with your bucket name at runtime. `list_gcs_blobs_op('List', bucket)` will create a component named `List` that lists the blobs. `view_input_op('View', list_blobs_task.outputs['blobs'])` will create a component named `View` that views a CSV. `list_blobs_task.outputs['blobs']` tells the pipeline to take the output of the first component stored as string in `blobs.txt` as an input for the second component."
    ]
   },
   {
@@ -524,12 +523,14 @@
     "  name='Quickstart pipeline',\n",
     "  description='Takes a GCS bucket name views a CSV input file in the bucket.'\n",
     ")\n",
-    "def pipeline_func(bucket=kfp.dsl.PipelineParam('bucket', value='Enter your bucket name here.')):\n",
-    "  list_blobs_task = ListGcsBlobsOp('List', bucket)\n",
-    "  view_input_task = ViewInputOp('View', list_blobs_task.outputs['blobs'])\n",
+    "def pipeline_func(bucket=kfp.dsl.PipelineParam(\n",
+    "    'bucket', value='Enter your bucket name here.')):\n",
+    "  list_blobs_task = list_gcs_blobs_op('List', bucket)\n",
+    "  view_input_task = view_input_op('View', list_blobs_task.outputs['blobs'])\n",
     "\n",
     "# Compile the pipeline to a file.\n",
-    "filename = 'tmp/pipelines/quickstart_pipeline{dt:%Y%m%d_%H%M%S}.pipeline.tar.gz'.format(dt=datetime.datetime.now())\n",
+    "filename = 'tmp/pipelines/quickstart_pipeline{dt:%Y%m%d_%H%M%S}.pipeline.tar.gz'.format(\n",
+    "    dt=datetime.datetime.now())\n",
     "compiler.Compiler().compile(pipeline_func, filename)"
    ]
   },
@@ -537,15 +538,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Go to your Cloud Shell and run `conn.sh`, and click on the provided link. In the new tab that opens, click on \"Pipelines Dashboard\". Upload the created pipeline and run it."
+    "Follow the [instructions](https://www.kubeflow.org/docs/other-guides/accessing-uis/) on kubeflow.org to access Kubeflow UIs. Upload the created pipeline and run it."
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {

From 50e73f06b528cdeabcceeedbe996255b20564b79 Mon Sep 17 00:00:00 2001
From: Rostam Dinyari <rostam@google.com>
Date: Tue, 2 Apr 2019 13:23:07 -0700
Subject: [PATCH 3/3] Incorporated comments.

---
 samples/notebooks/quickstart.ipynb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/samples/notebooks/quickstart.ipynb b/samples/notebooks/quickstart.ipynb
index 27f1359a88c..e880a118d11 100644
--- a/samples/notebooks/quickstart.ipynb
+++ b/samples/notebooks/quickstart.ipynb
@@ -27,7 +27,7 @@
    "source": [
     "# Part 1\n",
     "# Two ways to author a component to list blobs in a GCS bucket\n",
-    "A pipeline is composed of one or more components. In this section, you will build a single component that that lists the blobs in a GCS bucket. Then you buid a pipeline that consists of this component. There are two ways to author a component. In the following sections we will go through each of them."
+    "A pipeline is composed of one or more components. In this section, you will build a single component that lists the blobs in a GCS bucket. Then you buid a pipeline that consists of this component. There are two ways to author a component. In the following sections we will go through each of them."
    ]
   },
   {