Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Future interface to BigQuery jobs #3626

Merged
merged 8 commits into from
Jul 21, 2017

Conversation

theacodes
Copy link
Contributor

@theacodes theacodes commented Jul 18, 2017

Resolves #3556
Towards #3617

@theacodes theacodes added api: bigquery Issues related to the BigQuery API. DevEx do not merge Indicates a pull request not ready for merge, due to either quality or timing. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. labels Jul 18, 2017
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Jul 18, 2017
@theacodes
Copy link
Contributor Author

@tswast
Copy link
Contributor

tswast commented Jul 19, 2017

I don't like

query_job.result()
rows = query_job.results().fetch_data(max_results=10)

I'd prefer if it was

rows = query_job.result()

Also, the query job needs to be calling getQueryResults and checking for jobComplete in the response rather than polling the job state. The reason for this is that getQueryResults will wait for the job to complete or timeout (whichever comes first), whereas get Job returns immediately.

retry_on = tenacity.retry_if_result(
functools.partial(operator.is_not, True))
# Use exponential backoff with jitter.
wait_on = (

This comment was marked as spam.

This comment was marked as spam.

@theacodes
Copy link
Contributor Author

I'd prefer if it was
rows = query_job.result()

That seems reasonable as long as we don't lose data, but wouldn't that require switching to core iterator (#2840)? Could we accept the non-breaking behavior of this PR as-is for now and do another PR to switch the return value of result after #2840 is implemented?

Also, the query job needs to be calling getQueryResults and checking for jobComplete in the response rather than polling the job state. The reason for this is that getQueryResults will wait for the job to complete or timeout (whichever comes first), whereas get Job returns immediately.

Why is that preferable? Doing a long-lived HTTP request is quite different from any of the other behavior in this library. Also, would you be okay with addressing this in a follow-up PR rather than in this one?

@tswast
Copy link
Contributor

tswast commented Jul 19, 2017

#2840 is already implemented.

getQueryResults is preferable because the job polling method adds up to polling_interval milliseconds of latency to get the query results compared to the long-lived request which will return results as soon as they are available.

@tswast
Copy link
Contributor

tswast commented Jul 19, 2017

I'm okay with doing the getQueryResults in a subsequent PR.

@theacodes
Copy link
Contributor Author

@tswast Cool, are you good with this PR as-is then?

@tswast
Copy link
Contributor

tswast commented Jul 19, 2017

Could we make the change where QueryJob.result() returns QueryResults in this PR?

@tswast
Copy link
Contributor

tswast commented Jul 19, 2017

Could we make the change where QueryJob.result() returns QueryResults in this PR?

If not, I'll propose we make that as a breaking change when changing the polling method.

@theacodes
Copy link
Contributor Author

Could we make the change where QueryJob.result() returns QueryResults in this PR?

Yep. Done. Can you try this out and let me know how broken it is?

This should work:

def async_query(query):
    client = bigquery.Client()
    query_job = client.run_async_query(str(uuid.uuid4()), query)
    query_job.use_legacy_sql = False

    rows = query_job.result().fetch_data(max_results=10)
    for row in rows:
        print(row)

@theacodes
Copy link
Contributor Author

@tswast verified that this seems to work:

>>> from google.cloud import bigquery
>>> import uuid
>>>
>>> client = bigquery.Client()
>>> query = 'SELECT 1'
>>> query_job = client.run_async_query(str(uuid.uuid4()), query)
>>> query_job.use_legacy_sql = False
>>>
>>> rows = query_job.result().fetch_data(max_results=10)
>>> for row in rows:
...     print(row)
...
(1,)

@theacodes theacodes changed the title [DO NOT MERGE] Add Future interface to BigQuery jobs Add Future interface to BigQuery jobs Jul 20, 2017
Copy link
Contributor

@dhermes dhermes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no real concerns here though I think someone else should have a look?

@@ -27,6 +31,7 @@
from google.cloud.bigquery._helpers import UDFResourcesProperty
from google.cloud.bigquery._helpers import _EnumProperty
from google.cloud.bigquery._helpers import _TypedProperty
import google.cloud.future.base

This comment was marked as spam.

"""
client = self._require_client(client)

api_response = client._connection.api_request(
method='POST', path='%s/cancel' % (self.path,))
self._set_properties(api_response['job'])
return True

This comment was marked as spam.

This comment was marked as spam.

# set, do not call set_result/set_exception again.
# Note: self._result_set is set to True in set_result and
# set_exception, in case those methods are invoked directly.
if self.state != 'DONE' or self._result_set:

This comment was marked as spam.

This comment was marked as spam.


if self.error_result is not None:
exception = exceptions.GoogleCloudError(
self.error_result, errors=self.errors)

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

"""
# Do not refresh is the state is already done, as the job will not
# change once complete.
if self.state != 'DONE':

This comment was marked as spam.

This comment was marked as spam.

"""Construct a QueryResults instance, bound to this job.

:rtype: :class:`~google.cloud.bigquery.query.QueryResults`
:returns: results instance
"""
from google.cloud.bigquery.query import QueryResults
return QueryResults.from_query_job(self)

def results(self):
"""DEPRECATED.

This comment was marked as spam.

This comment was marked as spam.

'tableUnavailable': http_client.BAD_REQUEST,
}

_FakeResponse = collections.namedtuple('_FakeResponse', ['status'])

This comment was marked as spam.

This comment was marked as spam.



def _error_result_to_exception(error_result):
""""""

This comment was marked as spam.

This comment was marked as spam.

# make_exception expects an httplib2 response object.
fake_response = _FakeResponse(status=status_code)
return exceptions.make_exception(
fake_response, b'', error_info=error_result, use_json=False)

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

@@ -447,7 +487,8 @@ def cancelled(self):
:rtype: bool
:returns: False
"""
return False
return (self.error_result is not None
and self.error_result.get('reason') == 'stopped')

This comment was marked as spam.

This comment was marked as spam.

import unittest


class TestErrorResultToException(unittest.TestCase):

This comment was marked as spam.

This comment was marked as spam.

@theacodes theacodes removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Jul 21, 2017
Copy link
Contributor

@dhermes dhermes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@theacodes theacodes merged commit 6336e6c into googleapis:master Jul 21, 2017
@theacodes theacodes deleted the bigquery-future branch July 21, 2017 22:42
@theacodes
Copy link
Contributor Author

Thanks @dhermes and @tswast. This is a huge jump in developer experience for bigquery. 🍰

landrito pushed a commit to landrito/google-cloud-python that referenced this pull request Aug 21, 2017
* Add future interface to bigquery Jobs.
* Make QueryJob return QueryResults from result()
* Deprecate QueryJob.results()
landrito pushed a commit to landrito/google-cloud-python that referenced this pull request Aug 22, 2017
* Add future interface to bigquery Jobs.
* Make QueryJob return QueryResults from result()
* Deprecate QueryJob.results()
landrito pushed a commit to landrito/google-cloud-python that referenced this pull request Aug 22, 2017
* Add future interface to bigquery Jobs.
* Make QueryJob return QueryResults from result()
* Deprecate QueryJob.results()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. cla: yes This human has signed the Contributor License Agreement. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants