Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bigquery Jobs should match the Futures interface #3556

Closed
theacodes opened this issue Jun 27, 2017 · 3 comments
Closed

Bigquery Jobs should match the Futures interface #3556

theacodes opened this issue Jun 27, 2017 · 3 comments
Assignees
Labels
api: bigquery Issues related to the BigQuery API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@theacodes
Copy link
Contributor

Presently, bigquery uses a custom Job class (and subclasses) to deal with long-running operations such as queries and data import.

The Job class causes the proliferation of code like this around all of our samples and user code:

def wait_for_job(job):
    while True:
        job.reload()  # Refreshes the state via a GET request.
        if job.state == 'DONE':
            if job.error_result:
                raise RuntimeError(job.errors)
            return
        time.sleep(1)

Which in terms means our samples require a level on indirection to accomplish simple tasks:

client = bigquery.Client()
query_job = client.run_async_query(str(uuid.uuid4()), query)
query_job.use_legacy_sql = False
query_job.begin()

wait_for_job(query_job)

rows = query_job.results().fetch_data(max_results=10)
for row in rows:
    print(row)

However, our gax-based clients share a common "Long Running Operation" strategy that surfaces these types of on-going operations as a class that implements the concurrent.futures.Future interface. This is implemented as gax._OperationFuture. bigquery unfortunately can not use this as it's an http-only API and gax can only currently be used for gRPC APIs.

We should make the _BaseJob class and its subclasses closely conform to the Future interface so that the usage is similar to other API client and so that we can simplify usage to:

client = bigquery.Client()
query_job = client.run_async_query(str(uuid.uuid4()), query)
query_job.use_legacy_sql = False
query_job.begin()

# Wait for the job to finish
query_result = query.result()

rows = query_result.fetch_data(max_results=10)
for row in rows:
    print(row)
@theacodes theacodes added api: bigquery Issues related to the BigQuery API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. labels Jun 27, 2017
@tswast
Copy link
Contributor

tswast commented Jun 30, 2017

When you call .result() does it actually block? The Video Intelligence API sample, which uses that gax OperationFuture, I believe, shows some polling.

while not operation.done():
    sys.stdout.write('.')
    sys.stdout.flush()
    time.sleep(15)

@theacodes
Copy link
Contributor Author

theacodes commented Jun 30, 2017 via email

@theacodes
Copy link
Contributor Author

Related #3617

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

2 participants