Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery: add --max_results option to magic #9169

Merged
merged 7 commits into from
Sep 6, 2019

Conversation

shubha-rajan
Copy link
Contributor

Second of 3 PRs towards resolving #9105 as described in review for #9147

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Sep 4, 2019
@shubha-rajan shubha-rajan marked this pull request as ready for review September 4, 2019 18:32
@shubha-rajan shubha-rajan requested a review from a team September 4, 2019 18:32
@tswast
Copy link
Contributor

tswast commented Sep 4, 2019

I just merged #9167. We might have to rebase.

@googlebot
Copy link

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added cla: no This human has *not* signed the Contributor License Agreement. and removed cla: yes This human has signed the Contributor License Agreement. labels Sep 4, 2019
@shubha-rajan shubha-rajan force-pushed the bq-add-max-results-to-magic branch from 23fb140 to 37b4d39 Compare September 4, 2019 20:17
@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added cla: yes This human has signed the Contributor License Agreement. and removed cla: no This human has *not* signed the Contributor License Agreement. labels Sep 4, 2019
Copy link
Contributor

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually love to see a system / integration test for this feature, but I know we don't have any examples for you to work off of for that. (I'll see what I can do about that, since I know this will come up again in the future.)

Since we don't have a system for notebook integration tests in this repo, I'd like to see the results of manually testing this feature in a notebook.

@@ -300,7 +300,7 @@ def _run_query(client, query, job_config=None):
while True:
print("\rQuery executing: {:0.2f}s".format(time.time() - start_time), end="")
try:
query_job.result(timeout=0.5)
query_job.result(timeout=0.5, max_results=max_results)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we aren't actually returning the results in this line (just waiting for the query to finish), we don't need to pass max_results here. I believe that means we can remove the max_results parameter from _run_query as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in f88ffe8

"Defaults to returning all rows."
),
)
@magic_arguments.argument(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like these arguments got doubled up. (Maybe the two commits thing, we noticed when rebasing?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably what happened (moral of the story, always run tests before pushing). Removed the duplicate in 147e43d

@@ -414,7 +414,7 @@ def test_bigquery_magic_with_legacy_sql():
with run_query_patch as run_query_mock:
ip.run_cell_magic("bigquery", "--use_legacy_sql", "SELECT 17 AS num")

job_config_used = run_query_mock.call_args_list[0][0][-1]
job_config_used = run_query_mock.call_args_list[0][1]["job_config"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this change intentional? Maybe a bad rebase?

Copy link
Contributor Author

@shubha-rajan shubha-rajan Sep 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was intentional. It was to fix some test failures I was getting because the argument that was being retrieved was the wrong type (a result of changing the method signature for _run_query). This way the arg being accessed will always be job_config, even if other named parameters are added. Since I reverted the changes to _run_query, I can probably change this back too. I think the tests will pass either way

@@ -433,7 +455,9 @@ def _cell_magic(line, query):

error = None
try:
query_job = _run_query(client, query, job_config)
query_job = _run_query(
client, query, job_config=job_config, max_results=max_results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_results is not actually needed here, it's not until the call to to_dataframe that we need max_results.

Note: to_dataframe probably doesn't have a max_results argument, and I'm actually not certain that we'd want to add one. Instead, we can call query_job.results with a max_results argument and then call to_dataframe on the resulting RowIterator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated to call to_dataframe on query_job.result, passing max_results as an argument if max_results is present in f88ffe8

@shubha-rajan
Copy link
Contributor Author

Screenshot of running a query in a notebook with max_results set:
image

@shubha-rajan shubha-rajan requested a review from tswast September 5, 2019 22:25
@tswast tswast merged commit f00b60b into googleapis:master Sep 6, 2019
emar-kar pushed a commit to MaxxleLLC/google-cloud-python that referenced this pull request Sep 11, 2019
* added max_results magic option and fixed broken tests

* added tests for --max_results magic option

* added max_results magic option and fixed broken tests

* added tests for --max_results magic option

* Removed duplicate `--max_results` magic argument

* removed max_results param from run_query, updated tests
emar-kar pushed a commit to MaxxleLLC/google-cloud-python that referenced this pull request Sep 18, 2019
* added max_results magic option and fixed broken tests

* added tests for --max_results magic option

* added max_results magic option and fixed broken tests

* added tests for --max_results magic option

* Removed duplicate `--max_results` magic argument

* removed max_results param from run_query, updated tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants