Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(samples): uses function (create_job) more appropriate to the described sample intent #1309

Merged
merged 20 commits into from
Sep 2, 2022

Conversation

chalmerlowe
Copy link
Collaborator

@chalmerlowe chalmerlowe commented Aug 11, 2022

This is a work in progress.

The previous version simply used the .query() method, but the description and intent was to display the use of the create_job() method. This migrates the code to using the intended method.

Fixes #1085 🦕

BEGIN_COMMIT_OVERRIDE
docs(samples): uses function (create_job) more appropriate to the described sample intent
END_COMMIT_OVERRIDE

@product-auto-label product-auto-label bot added size: s Pull request size is small. api: bigquery Issues related to the googleapis/python-bigquery API. labels Aug 11, 2022
@chalmerlowe chalmerlowe changed the title fix: uses function more appropriate to the described title and intent fix: uses function (create_job) more appropriate to the described sample intent Aug 11, 2022
@chalmerlowe chalmerlowe self-assigned this Aug 11, 2022
@product-auto-label product-auto-label bot added the samples Issues that are directly related to samples. label Aug 12, 2022
@chalmerlowe chalmerlowe added the owlbot:run Add this label to trigger the Owlbot post processor. label Aug 12, 2022
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Aug 12, 2022
@chalmerlowe chalmerlowe added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 29, 2022
@yoshi-kokoro yoshi-kokoro removed kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Aug 29, 2022
@chalmerlowe chalmerlowe marked this pull request as ready for review August 29, 2022 17:32
@chalmerlowe chalmerlowe requested a review from a team August 29, 2022 17:32
@chalmerlowe chalmerlowe requested review from a team as code owners August 29, 2022 17:32
@chalmerlowe chalmerlowe requested a review from loferris August 29, 2022 17:32
@chalmerlowe
Copy link
Collaborator Author

There is a known bug that prevents pre-release tests from completing with versions of grpcio higher than 1.49rc0.

noxfile.py Outdated Show resolved Hide resolved
samples/create_job.py Show resolved Hide resolved
# and to set optional job resource properties, if needed.
# The job instance can be a LoadJob, CopyJob, ExtractJob, QueryJob
# Here, we demonstrate a "query" job.
# Reference: https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.client.Client.html#google.cloud.bigquery.client.Client.create_job
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not going to render well with the long line of text on Sample Browser or even seeing this on GitHub. I like the link to the documentation though, could you perhaps add this link on https://cloud.google.com/bigquery/docs/samples/bigquery-create-job page instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am confused: your link takes us to the page where this sample is displayed.
Is that intentional? That page does not currently provide additional information regarding the four types of jobs available.

Relatedly: in our renderings on the Sample Browser, is there a way to create standard hyperlinks within the code blocks displayed in the screen?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It currently does not, but once your PR updates it the page will also be updated.

Tried sifting through https://googlecloudplatform.github.io/samples-style-guide/ and we currently don't provide any guidance on how hyperlinks can be should they be too long... Let me get back to you on this after I ask some folks around.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the style guide a bit. thanks for linking to it. good reminder for me that it exists.

This https://googlecloudplatform.github.io/samples-style-guide/#clients item has a Python snippet with a really long URL in the code sample, just like the one I included.
¯_(ツ)_/¯

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could remove the ending anchor (#google.cloud...) to make the URL a bit smaller.

I recall that we have a g.co/cloud URL shortener for cloud.google.com pages (e.g. cloud.google.com/bigquery becomes g.co/cloud/bigquery, which isn't all that much shorter but we occasionally used it) I wonder if we could get g.co/bqpython or something similar pointing to the latest API reference?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I submitted a request for a short link:

g.co/bqpython > https://googleapis.dev/python/bigquery/latest/

It has to go through an approval process.

I would suggest that we not hold off on issuing this PR. Especially in light of the fact that even the style guide in code samples has examples of extremely long URLs, as noted in the comment above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into the shortlink process! I wasn't aware of such features. Hope it works out :)

I've asked the samples team for guidance, however it likely will take a long time for us to come up with a feasible solution, and will likely involve multiple teams. For now, it's adding more benefits so I'm happy to move forward as is.

# Here, we demonstrate a "query" job.
# Reference: https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.client.Client.html#google.cloud.bigquery.client.Client.create_job
#
# NOTE: the preferred approach is to use one of the dedicated API calls:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on what "preferred approach" should be for?

If it's for executing the query it seems counterproductive to have a sample for create_client if this is not the preferred method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tswast can you elaborate for us on this comment you made in the issue, as it relates to @dandhlee's question above?

"That section is about any kind of job, not just queries. As such, it should use the create_job method instead of the more specific query method. There should be comments that it is recommended to use the corresponding method for query/copy/load/extract."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One use-case this sample would be preferred to compared to one of the more specific examples is when folks try to retry failed jobs.

There are customers who use this method by iterating through a list of recent jobs and retrying any that have failed as a way to make their data pipelines a bit more robust.

One could also use this method as a way to create a job with an experimental API property that hasn't been added to the client library's manually written job configuration classes yet.

samples/create_job.py Outdated Show resolved Hide resolved
Copy link
Collaborator Author

@chalmerlowe chalmerlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See specific comments and questions at various places in the code.

noxfile.py Outdated Show resolved Hide resolved
# and to set optional job resource properties, if needed.
# The job instance can be a LoadJob, CopyJob, ExtractJob, QueryJob
# Here, we demonstrate a "query" job.
# Reference: https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.client.Client.html#google.cloud.bigquery.client.Client.create_job
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am confused: your link takes us to the page where this sample is displayed.
Is that intentional? That page does not currently provide additional information regarding the four types of jobs available.

Relatedly: in our renderings on the Sample Browser, is there a way to create standard hyperlinks within the code blocks displayed in the screen?

samples/create_job.py Outdated Show resolved Hide resolved
# Here, we demonstrate a "query" job.
# Reference: https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.client.Client.html#google.cloud.bigquery.client.Client.create_job
#
# NOTE: the preferred approach is to use one of the dedicated API calls:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tswast can you elaborate for us on this comment you made in the issue, as it relates to @dandhlee's question above?

"That section is about any kind of job, not just queries. As such, it should use the create_job method instead of the more specific query method. There should be comments that it is recommended to use the corresponding method for query/copy/load/extract."

Co-authored-by: Dan Lee <71398022+dandhlee@users.noreply.github.com>
@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: s Pull request size is small. labels Sep 1, 2022
@chalmerlowe
Copy link
Collaborator Author

@tswast @dandhlee

I added some justification for when/why .create_jobs() might be preferred over a direct call to .query(), etc.
I have also made a request (that needs to be approved) for a shorten URL, but I don't think that should hold up the processing of this PR.

Can you take a look at the changes I made and see if we are closer to the mark. Thanks.

# client.extract_table()
# client.copy_table()
# client.load_table_file(), client.load_table_from_dataframe(), etc
job_config={
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a link to https://cloud.google.com/bigquery/docs/reference/rest/v2/Job would be quite helpful here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the link.

Copy link
Contributor

@dandhlee dandhlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please see Tim's comment!

# and to set optional job resource properties, if needed.
# The job instance can be a LoadJob, CopyJob, ExtractJob, QueryJob
# Here, we demonstrate a "query" job.
# Reference: https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.client.Client.html#google.cloud.bigquery.client.Client.create_job
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into the shortlink process! I wasn't aware of such features. Hope it works out :)

I've asked the samples team for guidance, however it likely will take a long time for us to come up with a feasible solution, and will likely involve multiple teams. For now, it's adding more benefits so I'm happy to move forward as is.

@chalmerlowe chalmerlowe merged commit 5aeedaa into main Sep 2, 2022
@chalmerlowe chalmerlowe deleted the clowe-update-create-jobs branch September 2, 2022 00:35
@parthea parthea changed the title fix: uses function (create_job) more appropriate to the described sample intent docs(samples): uses function (create_job) more appropriate to the described sample intent Sep 2, 2022
@parthea parthea added the release-please:force-run To run release-please label Sep 2, 2022
@release-please release-please bot removed the release-please:force-run To run release-please label Sep 2, 2022
abdelmegahedgoogle pushed a commit to abdelmegahedgoogle/python-bigquery that referenced this pull request Apr 17, 2023
…ple intent (googleapis#1309)

* fix: uses function more appropriate to the described title

* adds additional explanation for the end users

* adds REST API URL for reference

* corrects flake 8 linter errors

* blackens file

* adds type hints

* avoids unreliable version of grpcio

* updates imports to fix linting error

* better method to  avoid grpcio 1.49.0rc1

* Update samples/create_job.py

Co-authored-by: Dan Lee <71398022+dandhlee@users.noreply.github.com>

* adds further explanation on when/why to use create_jobs

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* updates references

Co-authored-by: Dan Lee <71398022+dandhlee@users.noreply.github.com>
Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. samples Issues that are directly related to samples. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python bigquery_create_job sample is inconsistent with other code samples
5 participants