Skip to content
This repository has been archived by the owner on Dec 31, 2023. It is now read-only.

docs: add samples from tables/automl #54

Merged
merged 45 commits into from
Aug 13, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
057a325
Tables Notebooks [(#2090)](https://github.com/GoogleCloudPlatform/pyt…
sirtorry Apr 8, 2019
6c8a34f
remove the reference to a bug [(#2100)](https://github.com/GoogleClou…
merla18 Apr 8, 2019
ac5f06a
delete this file. [(#2102)](https://github.com/GoogleCloudPlatform/py…
merla18 Apr 8, 2019
dad4ebf
rename file name [(#2103)](https://github.com/GoogleCloudPlatform/pyt…
merla18 Apr 8, 2019
be272ea
trying to fix images [(#2101)](https://github.com/GoogleCloudPlatform…
merla18 Apr 8, 2019
3e1eae6
remove typo in installation [(#2110)](https://github.com/GoogleCloudP…
merla18 Apr 13, 2019
eed69e3
Rename census_income_prediction.ipynb to getting_started_notebook.ipy…
merla18 May 1, 2019
bef66e7
added back missing file package import [(#2150)](https://github.com/G…
merla18 May 20, 2019
d7498ca
added back missing file import [(#2145)](https://github.com/GoogleClo…
merla18 May 20, 2019
4e29670
remove incorrect reference to Iris dataset [(#2203)](https://github.c…
emmby Jun 10, 2019
81f2a34
conversion to jupyter/colab [(#2340)](https://github.com/GoogleCloudP…
merla18 Sep 5, 2019
6d7ec03
updated for the Jupyter support [(#2337)](https://github.com/GoogleCl…
merla18 Sep 5, 2019
482211a
updated readme for support Jupyter [(#2336)](https://github.com/Googl…
merla18 Sep 5, 2019
cbb9685
conversion to jupyer/colab [(#2339)](https://github.com/GoogleCloudPl…
merla18 Sep 5, 2019
aaea837
conversion of notebook for jupyter/Colab [(#2338)](https://github.com…
merla18 Sep 5, 2019
7c23e1d
[BLOCKED] AutoML Tables: Docs samples updated to use new (pending) cl…
lwander Sep 6, 2019
48bc7d2
add product recommendation for automl tables notebook [(#2257)](https…
TheMichaelHu Sep 18, 2019
142261e
AutoML Tables: Notebook samples updated to use new tables client [(#2…
lwander Oct 5, 2019
af31274
fix users bug and emphasize kernal restart [(#2407)](https://github.c…
TheMichaelHu Oct 7, 2019
fe2e911
fix problems with automl docs [(#2501)](https://github.com/GoogleClou…
alefhsousa Nov 19, 2019
47e5801
Fix typo in GCS URI parameter [(#2459)](https://github.com/GoogleClou…
lwander Nov 20, 2019
d0a2d74
fix: fix tables notebook links and bugs [(#2601)](https://github.com/…
sirtorry Dec 12, 2019
aa86fbc
feat(tables): update samples to show explainability [(#2523)](https:/…
sirtorry Dec 18, 2019
59bd0cb
Auto-update dependencies. [(#2005)](https://github.com/GoogleCloudPla…
dpebot Dec 21, 2019
12e24d4
Update dependency google-cloud-automl to v0.10.0 [(#3033)](https://gi…
renovate-bot Mar 6, 2020
b119f72
Simplify noxfile setup. [(#2806)](https://github.com/GoogleCloudPlatf…
kurtisvg Apr 2, 2020
184930a
chore: some lint fixes [(#3750)](https://github.com/GoogleCloudPlatfo…
May 13, 2020
f87fc01
automl: tables code sample clean-up [(#3571)](https://github.com/Goog…
Strykrol May 13, 2020
1224e5e
add example of creating AutoML Tables client with non-default endpoin…
amygdala Jun 5, 2020
a13cdb2
Replace GCLOUD_PROJECT with GOOGLE_CLOUD_PROJECT. [(#4022)](https://g…
kurtisvg Jun 9, 2020
9b4d162
chore(deps): update dependency google-cloud-automl to v1 [(#4127)](ht…
renovate-bot Jun 19, 2020
7bea599
[tables/automl] fix: update the csv file and the dataset name [(#4188…
Jun 26, 2020
a690cba
samples: Automl table batch test [(#4267)](https://github.com/GoogleC…
munkhuushmgl Jul 9, 2020
aa48046
samples: fixed wrong format on GCS input Uri [(#4270)](https://github…
munkhuushmgl Jul 10, 2020
4f6f978
chore(deps): update dependency pytest to v5.4.3 [(#4279)](https://git…
renovate-bot Jul 12, 2020
784d0cc
Update automl_tables_predict.py with batch_predict_bq sample [(#4142)…
evil-shrike Jul 17, 2020
cab6955
Update dependency pytest to v6 [(#4390)](https://github.com/GoogleClo…
renovate-bot Aug 1, 2020
b6a236d
chore: exclude notebooks
busunkim96 Aug 7, 2020
c5720e8
chore: update templates
busunkim96 Aug 7, 2020
c398641
chore: add codeowners and fix tests
busunkim96 Aug 13, 2020
f0362bb
chore: ignore warnings from sphinx
busunkim96 Aug 13, 2020
b83ea49
chore: fix tables client
busunkim96 Aug 13, 2020
d0e251d
Merge branch 'master' into add-tables-samples
busunkim96 Aug 13, 2020
ebb30b0
test: fix unit tests
busunkim96 Aug 13, 2020
d0efc69
Merge branch 'add-tables-samples' of github.com:busunkim96/python-aut…
busunkim96 Aug 13, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Code owners file.
# This file controls who is tagged for review for any given pull request.
#
# For syntax help see:
# https://help.github.com/en/github/creating-cloning-and-archiving-repositories/about-code-owners#codeowners-syntax


/samples/**/*.py @telpirion @sirtorry @googleapis/python-samples-owners
6 changes: 5 additions & 1 deletion google/cloud/automl_v1beta1/tables/tables_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -2762,6 +2762,7 @@ def batch_predict(
region=None,
credentials=None,
inputs=None,
params={},
**kwargs
):
"""Makes a batch prediction on a model. This does _not_ require the
Expand Down Expand Up @@ -2828,6 +2829,9 @@ def batch_predict(
The `model` instance you want to predict with . This must be
supplied if `model_display_name` or `model_name` are not
supplied.
params (Optional[dict]):
Additional domain-specific parameters for the predictions,
any string must be up to 25000 characters long.

Returns:
google.api_core.operation.Operation:
Expand Down Expand Up @@ -2886,7 +2890,7 @@ def batch_predict(
)

op = self.prediction_client.batch_predict(
model_name, input_request, output_request, **kwargs
model_name, input_request, output_request, params, **kwargs
)
self.__log_operation_info("Batch predict", op)
return op
306 changes: 306 additions & 0 deletions samples/tables/automl_tables_dataset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,306 @@
#!/usr/bin/env python

# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""This application demonstrates how to perform basic operations on dataset
with the Google AutoML Tables API.

For more information, the documentation at
https://cloud.google.com/automl-tables/docs.
"""

import argparse
import os


def create_dataset(project_id, compute_region, dataset_display_name):
"""Create a dataset."""
# [START automl_tables_create_dataset]
# TODO(developer): Uncomment and set the following variables
# project_id = 'PROJECT_ID_HERE'
# compute_region = 'COMPUTE_REGION_HERE'
# dataset_display_name = 'DATASET_DISPLAY_NAME_HERE'

from google.cloud import automl_v1beta1 as automl

client = automl.TablesClient(project=project_id, region=compute_region)

# Create a dataset with the given display name
dataset = client.create_dataset(dataset_display_name)

# Display the dataset information.
print("Dataset name: {}".format(dataset.name))
print("Dataset id: {}".format(dataset.name.split("/")[-1]))
print("Dataset display name: {}".format(dataset.display_name))
print("Dataset metadata:")
print("\t{}".format(dataset.tables_dataset_metadata))
print("Dataset example count: {}".format(dataset.example_count))
print("Dataset create time:")
print("\tseconds: {}".format(dataset.create_time.seconds))
print("\tnanos: {}".format(dataset.create_time.nanos))

# [END automl_tables_create_dataset]

return dataset


def list_datasets(project_id, compute_region, filter_=None):
"""List all datasets."""
result = []
# [START automl_tables_list_datasets]
# TODO(developer): Uncomment and set the following variables
# project_id = 'PROJECT_ID_HERE'
# compute_region = 'COMPUTE_REGION_HERE'
# filter_ = 'filter expression here'

from google.cloud import automl_v1beta1 as automl

client = automl.TablesClient(project=project_id, region=compute_region)

# List all the datasets available in the region by applying filter.
response = client.list_datasets(filter_=filter_)

print("List of datasets:")
for dataset in response:
# Display the dataset information.
print("Dataset name: {}".format(dataset.name))
print("Dataset id: {}".format(dataset.name.split("/")[-1]))
print("Dataset display name: {}".format(dataset.display_name))
metadata = dataset.tables_dataset_metadata
print(
"Dataset primary table spec id: {}".format(
metadata.primary_table_spec_id
)
)
print(
"Dataset target column spec id: {}".format(
metadata.target_column_spec_id
)
)
print(
"Dataset target column spec id: {}".format(
metadata.target_column_spec_id
)
)
print(
"Dataset weight column spec id: {}".format(
metadata.weight_column_spec_id
)
)
print(
"Dataset ml use column spec id: {}".format(
metadata.ml_use_column_spec_id
)
)
print("Dataset example count: {}".format(dataset.example_count))
print("Dataset create time:")
print("\tseconds: {}".format(dataset.create_time.seconds))
print("\tnanos: {}".format(dataset.create_time.nanos))
print("\n")

# [END automl_tables_list_datasets]
result.append(dataset)

return result


def get_dataset(project_id, compute_region, dataset_display_name):
"""Get the dataset."""
# TODO(developer): Uncomment and set the following variables
# project_id = 'PROJECT_ID_HERE'
# compute_region = 'COMPUTE_REGION_HERE'
# dataset_display_name = 'DATASET_DISPLAY_NAME_HERE'

from google.cloud import automl_v1beta1 as automl

client = automl.TablesClient(project=project_id, region=compute_region)

# Get complete detail of the dataset.
dataset = client.get_dataset(dataset_display_name=dataset_display_name)

# Display the dataset information.
print("Dataset name: {}".format(dataset.name))
print("Dataset id: {}".format(dataset.name.split("/")[-1]))
print("Dataset display name: {}".format(dataset.display_name))
print("Dataset metadata:")
print("\t{}".format(dataset.tables_dataset_metadata))
print("Dataset example count: {}".format(dataset.example_count))
print("Dataset create time:")
print("\tseconds: {}".format(dataset.create_time.seconds))
print("\tnanos: {}".format(dataset.create_time.nanos))

return dataset


def import_data(project_id, compute_region, dataset_display_name, path):
"""Import structured data."""
# [START automl_tables_import_data]
# TODO(developer): Uncomment and set the following variables
# project_id = 'PROJECT_ID_HERE'
# compute_region = 'COMPUTE_REGION_HERE'
# dataset_display_name = 'DATASET_DISPLAY_NAME'
# path = 'gs://path/to/file.csv' or 'bq://project_id.dataset.table_id'

from google.cloud import automl_v1beta1 as automl

client = automl.TablesClient(project=project_id, region=compute_region)

response = None
if path.startswith("bq"):
response = client.import_data(
dataset_display_name=dataset_display_name, bigquery_input_uri=path
)
else:
# Get the multiple Google Cloud Storage URIs.
input_uris = path.split(",")
response = client.import_data(
dataset_display_name=dataset_display_name,
gcs_input_uris=input_uris,
)

print("Processing import...")
# synchronous check of operation status.
print("Data imported. {}".format(response.result()))

# [END automl_tables_import_data]


def update_dataset(
project_id,
compute_region,
dataset_display_name,
target_column_spec_name=None,
weight_column_spec_name=None,
test_train_column_spec_name=None,
):
"""Update dataset."""
# TODO(developer): Uncomment and set the following variables
# project_id = 'PROJECT_ID_HERE'
# compute_region = 'COMPUTE_REGION_HERE'
# dataset_display_name = 'DATASET_DISPLAY_NAME_HERE'
# target_column_spec_name = 'TARGET_COLUMN_SPEC_NAME_HERE' or None
# weight_column_spec_name = 'WEIGHT_COLUMN_SPEC_NAME_HERE' or None
# test_train_column_spec_name = 'TEST_TRAIN_COLUMN_SPEC_NAME_HERE' or None

from google.cloud import automl_v1beta1 as automl

client = automl.TablesClient(project=project_id, region=compute_region)

if target_column_spec_name is not None:
response = client.set_target_column(
dataset_display_name=dataset_display_name,
column_spec_display_name=target_column_spec_name,
)
print("Target column updated. {}".format(response))
if weight_column_spec_name is not None:
response = client.set_weight_column(
dataset_display_name=dataset_display_name,
column_spec_display_name=weight_column_spec_name,
)
print("Weight column updated. {}".format(response))
if test_train_column_spec_name is not None:
response = client.set_test_train_column(
dataset_display_name=dataset_display_name,
column_spec_display_name=test_train_column_spec_name,
)
print("Test/train column updated. {}".format(response))


def delete_dataset(project_id, compute_region, dataset_display_name):
"""Delete a dataset"""
# [START automl_tables_delete_dataset]
# TODO(developer): Uncomment and set the following variables
# project_id = 'PROJECT_ID_HERE'
# compute_region = 'COMPUTE_REGION_HERE'
# dataset_display_name = 'DATASET_DISPLAY_NAME_HERE

from google.cloud import automl_v1beta1 as automl

client = automl.TablesClient(project=project_id, region=compute_region)

# Delete a dataset.
response = client.delete_dataset(dataset_display_name=dataset_display_name)

# synchronous check of operation status.
print("Dataset deleted. {}".format(response.result()))
# [END automl_tables_delete_dataset]


if __name__ == "__main__":
parser = argparse.ArgumentParser(
description=__doc__,
formatter_class=argparse.RawDescriptionHelpFormatter,
)
subparsers = parser.add_subparsers(dest="command")

create_dataset_parser = subparsers.add_parser(
"create_dataset", help=create_dataset.__doc__
)
create_dataset_parser.add_argument("--dataset_name")

list_datasets_parser = subparsers.add_parser(
"list_datasets", help=list_datasets.__doc__
)
list_datasets_parser.add_argument("--filter_")

get_dataset_parser = subparsers.add_parser(
"get_dataset", help=get_dataset.__doc__
)
get_dataset_parser.add_argument("--dataset_display_name")

import_data_parser = subparsers.add_parser(
"import_data", help=import_data.__doc__
)
import_data_parser.add_argument("--dataset_display_name")
import_data_parser.add_argument("--path")

update_dataset_parser = subparsers.add_parser(
"update_dataset", help=update_dataset.__doc__
)
update_dataset_parser.add_argument("--dataset_display_name")
update_dataset_parser.add_argument("--target_column_spec_name")
update_dataset_parser.add_argument("--weight_column_spec_name")
update_dataset_parser.add_argument("--ml_use_column_spec_name")

delete_dataset_parser = subparsers.add_parser(
"delete_dataset", help=delete_dataset.__doc__
)
delete_dataset_parser.add_argument("--dataset_display_name")

project_id = os.environ["PROJECT_ID"]
compute_region = os.environ["REGION_NAME"]

args = parser.parse_args()
if args.command == "create_dataset":
create_dataset(project_id, compute_region, args.dataset_name)
if args.command == "list_datasets":
list_datasets(project_id, compute_region, args.filter_)
if args.command == "get_dataset":
get_dataset(project_id, compute_region, args.dataset_display_name)
if args.command == "import_data":
import_data(
project_id, compute_region, args.dataset_display_name, args.path
)
if args.command == "update_dataset":
update_dataset(
project_id,
compute_region,
args.dataset_display_name,
args.target_column_spec_name,
args.weight_column_spec_name,
args.ml_use_column_spec_name,
)
if args.command == "delete_dataset":
delete_dataset(project_id, compute_region, args.dataset_display_name)
Loading