Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery Beta 2 Changes #4245

Merged
merged 162 commits into from
Oct 24, 2017
Merged

BigQuery Beta 2 Changes #4245

merged 162 commits into from
Oct 24, 2017

Conversation

tswast
Copy link
Contributor

@tswast tswast commented Oct 24, 2017

This pull request implements requirements for the beta 2 launch as well as a redesign of the API surface. Changes include:

  • Query and view operations default to the Standard SQL dialect.
  • Client functions related to jobs like running queries immediately start the job.
    • Use QueryJobConfig and analogous classes to configure optional properties about the job before starting it with the client methods.
    • Use the concurrent.futures API (e.g. result()) on jobs to wait for jobs to complete. In the case of QueryJob, the result() method returns an iterator over the rows of the destination table.
  • Added TableReference and DatasetReference classes to be more clear about when an object does not contain additional properties of the resource.
  • Functions to create/get/update/delete datasets and tables moved from Table and Dataset to the Client class.
  • Use Client.create_rows() instead of Table.insert_data() to stream rows to a table.
  • Use Client.list_rows() instead of Table.fetch_data().
  • Row iterators allow access to column values by keyword or attribute, in addition to by integer index.

/cc @alixhami @jba

tseaver and others added 30 commits October 12, 2017 13:27
* Rename class: 'jobs.LoadTableFromStorageJob' -> 'jobs.LoadJob'.

* Rename class: 'jobs.ExtractTableToStorageJob' -> 'jobs.ExtractJob'.
* Rename class: 'dataset.AccessGrant' -> 'dataset.AccessEntry'.

* PEP8 names for unit test helpers.

* Rename 'Dataset.access_grants' -> 'Dataaset.access_entries'.
* Add 'QueryJob.total_bytes_processed' property.
* Add 'QueryJob.total_bytes_billed' property.
* Add 'QueryJob.billing_tier' property.
* Add 'QueryJob.cache_hit' property.
* Add 'QueryJob.num_dml_affected_rows' property.
* Add 'QueryJob.statement_type' property.
* Allow assigning 'None' to '_TypedProperty' properties.

* Ensure that configuration properties are copied when (re)loading jobs.
…of Dataset. (#3944)

* BigQuery: Add TestReference class. Add table function to DatasetReference

* BigQuery: Modify client.dataset() to return DatasetReference instead of Dataset.

* Bigquery: client.dataset() uses default project if not specified
* bigquery: rename TableReference.dataset_ref

Rename to dataset to be consistent with Client.dataset. Both
methods actually return a DatasetReference.

* fix broken tests
* bigquery: rename name field of Dataset to dataset_id

Rename the former dataset_id property to full_dataset_id.

Also rename Table.dataset_name to Table.dataset_id.

Perform other renamings (of various variables and constants).

These names match usage better. The API's Dataset.id field is
"project:dataset_id", which is confusing and basically useless,
so it's a mistake to call that dataset_id.

* fix long line

* fix long line
* bigquery: rename name field of Table to table_id

Also rename table_id to full_table_id.

* fix lint errors

* fix doc
* BQ: rename XJob.name to XJob.job_id.

* BQ: Remove references to table.name
* Parse timestamps in query parameters according to BigQuery canonical timestamp format.

The timestamp format in query parameters follows the canonical format
specified at
https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#timestamp-type

This fixes a system test error which was happening in the bigquery-b2
branch.

* Support more possible timestamp formats.

Any of these formats may be returned from the BigQuery API.

* Chop and string-replace timestamps into a canonical format.

* BQ: fix lint errors. Remove references to table.name
* BigQuery: Adds client.get_dataset() and removes dataset.reload()

* BigQuery: changes dataset.name to dataset.dataset_id in test

* fixes client.get_dataset() docstring and removes unnecessary test variable
* bigquery: add client.create_dataset; remove dataset.create

* fix lint

* increase coverage to 100%

* really fix coverage

* fix lint
* bigquery: remove dataset.exists

Dataset won't be able to support this method when we remove its client.

Don't add client.dataset_exists; the user can use client.get_dataset and
catch NotFound.

* fix lint

* fix lint agian

* fix more lint
#3997)

* wip update Table contructor

* BigQuery: Updates Table constructor to use TableReference as parameter

* fixes circular import error with Python 2.7
* BQ: client.extract_table starts extract job

Add system tests for extract_table.

* BigQuery: client.extract_table use `**kwargs` for Python 2.7.

* BQ: extract_table. Use dict.get for kwargs. job_id instead of job_name.
* WIP adds client.get_table()

* BigQuery: Adds client.get_table() and removes table.reload()

* removes unnecessary variable

* adds system test for client.get_table()
* bigquery: add Client.update_dataset

Remove Dataset.patch and Dataset.update.

* improve cover

* more coverage

* update system tests

* more coverage

* add creds to client

* small changes

* .

* convert Python field name to API field name
…/dataset_id properties (#4011)

* adds dataset_id and project properties to TableReference

* Remove dataset property from Table and TableReference
* bigquery: add client.delete_dataset

* support Dataset as well as DatasetReference

* fix lint
tswast and others added 11 commits October 16, 2017 15:15
Support filtering datasets by label.
* BigQuery: populate timeout parameter for getQueryResults

This will allow QueryJob to respect the timeout value for futures.

* query_rows: Clarify that timeout is in seconds.

* Wait until the end of calculations to convert to milliseconds.
)

* adds helper function for snake to camel case conversion

* adds unit test
* Updates snippets for BigQuery Beta 2 changes

* fixes flake8 issues

* removes module imports

* fixes snippets
…4236)

* BigQuery: make docstrings use bigquery module, like the samples do.

All the public classes we expect developers to use are included in the
`google.cloud.bigquery` module, and it is this module that we use in
code samples.

Also, I found one error in the Bigtable docs where `Row` was not being
used as a local reference and conflicted with the BigQuery Row.

* Adjust heading underline.
@tswast tswast added the api: bigquery Issues related to the BigQuery API. label Oct 24, 2017
@tswast
Copy link
Contributor Author

tswast commented Oct 24, 2017

Question: When we merge this are we going to squash it?

@tseaver
Copy link
Contributor

tseaver commented Oct 24, 2017

@tswast We'd have to, unless one of us does the merge at the command line (non-squash PR merge is disabled for the project).

@lukesneeringer
Copy link
Contributor

Question: When we merge this are we going to squash it?

I can enable non-squash merge temporarily if you want.

@tseaver
Copy link
Contributor

tseaver commented Oct 24, 2017

@lukesneeringer

I can enable non-squash merge temporarily if you want.

ISTM we could just do it at the command line:

$ git checkout master && git fetch --all --prune && git merge upstream master
$ git merge bigquery-b2 
$ git push upstream master

@tswast tswast merged commit 09cf23a into master Oct 24, 2017
@dhermes dhermes deleted the bigquery-b2 branch November 22, 2017 17:15
parthea pushed a commit that referenced this pull request Oct 21, 2023
…docs-samples#4245)

fixes #4235
(by retrying upon InternalServerError)

Co-authored-by: Leah E. Cole <6719667+leahecole@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants