From fbaabb9aaf268f1042644d47c6275713b49c8563 Mon Sep 17 00:00:00 2001 From: Sarah Maddox Date: Wed, 7 Nov 2018 16:45:05 +0000 Subject: [PATCH] Updated the tfx sample README Fixed a link. Clarified YAML vs TAR format for workflow specification. Made other textual improvements. --- samples/tfx/README.md | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/samples/tfx/README.md b/samples/tfx/README.md index 94155edc185..e018e0bd994 100644 --- a/samples/tfx/README.md +++ b/samples/tfx/README.md @@ -1,4 +1,4 @@ -This sample runs a pipeline with tensorflow transform and model-analysis components. +The `taxi-cab-classification-pipeline.py` sample runs a pipeline with TensorFlow's transform and model-analysis components. ## The dataset @@ -25,28 +25,26 @@ Preprocessing and model analysis use [Apache Beam](https://beam.apache.org/). When run with the `cloud` mode (instead of the `local` mode), those steps use [Google Cloud DataFlow](https://beam.apache.org/) for running the Beam pipelines. -As such, the DataFlow API needs to be enabled for the given project if you want to use `cloud` as the mode for either preprocessing or analysis. - -Instructions for enabling that can be found [here](https://cloud.google.com/endpoints/docs/openapi/enable-api). +Therefore, you must enable the DataFlow API for the given GCP project if you want to use `cloud` as the mode for either preprocessing or analysis. See the [guide to enabling the DataFlow API](https://cloud.google.com/endpoints/docs/openapi/enable-api). ## Compiling the pipeline template -Follow [README.md](https://github.com/kubeflow/pipelines/blob/master/samples/README.md) to install the compiler and then run the following to compile the pipeline: +Follow the guide to [building a pipeline](https://github.com/kubeflow/pipelines/wiki/Build-a-Pipeline) to install the Kubeflow Pipelines SDK, then run the following command to compile the sample Python into a workflow specification. The specification takes the form of a YAML file compressed into a `.tar.gz` file. ```bash dsl-compile --py taxi-cab-classification-pipeline.py --output taxi-cab-classification-pipeline.tar.gz ``` -## Deploying a pipeline +## Deploying the pipeline -Open the ML pipeline UI. Create a new pipeline, and then upload the compiled YAML file as a new pipeline template. +Open the Kubeflow pipelines UI. Create a new pipeline, and then upload the compiled specification (`.tar.gz` file) as a new pipeline template. -The pipeline will require two arguments: +The pipeline requires two arguments: 1. The name of a GCP project. -2. An output directory in a GCS bucket, of the form `gs:///`. +2. An output directory in a Google Cloud Storage bucket, of the form `gs:///`. -## Components Source +## Components source Preprocessing: [source code](https://github.com/kubeflow/pipelines/tree/master/components/dataflow/tft)