Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated the kubeflow-tf sample README #130

Merged
merged 1 commit into from
Nov 8, 2018
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 14 additions & 12 deletions samples/kubeflow-tf/README.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,34 @@
## Overview
The pipeline creates a TensorFlow model on structured data and image URLs (Google Storage). It works for both classification and regression.
Everything runs inside the pipeline cluster (KubeFlow). The only possible dependency is DataFlow if you enable the "*cloud*" mode for

The `kubeflow-training-classification.py` pipeline creates a TensorFlow model on structured data and image URLs (Google Cloud Storage). It works for both classification and regression.
Everything runs inside the pipeline cluster (Kubeflow). The only possible dependency is Google Cloud DataFlow if you enable the "*cloud*" mode for
the preprocessing or prediction step.

## The requirements
By default, the preprocessing and prediction steps use the "*local*" mode and run inside the cluster. If users specify the value of "*preprocess_mode*" as "*cloud*",
[DataFlow API](https://cloud.google.com/endpoints/docs/openapi/enable-api) needs to be enabled for the given project such that the preprocessing step
can use Google Cloud DataFlow.

Note: The trainer depends on KubeFlow API Version v1alpha2.
By default, the preprocessing and prediction steps use the "*local*" mode and run inside the cluster. If you specify the value of "*preprocess_mode*" as "*cloud*", you must enable the
[DataFlow API](https://cloud.google.com/endpoints/docs/openapi/enable-api) for the given GCP project so that the preprocessing step
can use Cloud DataFlow.

Note: The trainer depends on Kubeflow API version v1alpha2.

## Compiling the pipeline template

Follow [README.md](https://github.com/kubeflow/pipelines/blob/master/samples/README.md) to install the compiler and then run the following command to compile the pipeline:
Follow the guide to [building a pipeline](https://github.com/kubeflow/pipelines/wiki/Build-a-Pipeline) to install the Kubeflow Pipelines SDK, then run the following command to compile the sample Python into a workflow specification. The specification takes the form of a YAML file compressed into a `.tar.gz` file.

```bash
dsl-compile --py kubeflow-training-classification.py --output kubeflow-training-classification.tar.gz
```

## Deploying a pipeline
## Deploying the pipeline

Open the ML pipeline UI. Create a new pipeline, and then upload the compiled YAML file as a new pipeline template.
Open the Kubeflow pipelines UI. Create a new pipeline, and then upload the compiled specification (`.tar.gz` file) as a new pipeline template.

The pipeline will require one argument:
The pipeline requires one argument:

1. An output directory in a GCS bucket, of the form `gs://<BUCKET>/<PATH>`.
1. An output directory in a Google Cloud Storage bucket, of the form `gs://<BUCKET>/<PATH>`.

## Components Source
## Components source

Preprocessing:
[source code](https://github.com/kubeflow/pipelines/tree/master/components/dataflow/tft),
Expand Down