diff --git a/samples/tfx-oss/README.md b/samples/tfx-oss/README.md index 813c7d07378..acc3f74360e 100644 --- a/samples/tfx-oss/README.md +++ b/samples/tfx-oss/README.md @@ -49,7 +49,7 @@ tfx/examples/chicago_taxi_pipeline/taxi_pipeline_kubeflow_large.py Configure - Set `_input_bucket` to the GCS directory where you've copied taxi_utils.py. I.e. gs://// - Set `_output_bucket` to the GCS directory where you've want the results to be written -- Set GCP project ID (replace my-gcp-project). Note that it should be project ID (usually has numbers in the end), not project name. +- Set GCP project ID (replace my-gcp-project). Note that it should be project ID, not project name. - The original BigQuery dataset has 100M rows, which can take time to process. Modify the selection criteria (% of records) to run a sample test. ## Compile and run the pipeline diff --git a/samples/tfx-oss/TFX Example.ipynb b/samples/tfx-oss/TFX Example.ipynb index 943deac8821..fd7b488469b 100644 --- a/samples/tfx-oss/TFX Example.ipynb +++ b/samples/tfx-oss/TFX Example.ipynb @@ -6,6 +6,8 @@ "source": [ "# TFX on KubeFlow Pipelines Example\n", "\n", + "This notebook should be run inside a KF Pipelines cluster.\n", + "\n", "### Install TFX and KFP packages" ] }, @@ -68,7 +70,7 @@ "Configure:\n", "- Set `_input_bucket` to the GCS directory where you've copied taxi_utils.py. I.e. gs:////\n", "- Set `_output_bucket` to the GCS directory where you've want the results to be written\n", - "- Set GCP project ID (replace my-gcp-project). Note that it should be project ID (usually has numbers in the end), not project name.\n", + "- Set GCP project ID (replace my-gcp-project). Note that it should be project ID, not project name.\n", "\n", "The dataset in BigQuery has 100M rows, you can change the query parameters in WHERE clause to limit the number of rows used.\n" ]