Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery: Add support to Dataset for project_ids with org prefix. #8877

Merged

Conversation

emar-kar
Copy link
Contributor

@emar-kar emar-kar commented Aug 1, 2019

Closes: #8646

emar-kar added 4 commits July 30, 2019 17:43
added support to Dataset for project_ids with org prefix
updated tests to check dataset chgs
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Aug 1, 2019
@IlyaFaer IlyaFaer added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 1, 2019
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 1, 2019
@IlyaFaer IlyaFaer added the api: bigquery Issues related to the BigQuery API. label Aug 1, 2019
@IlyaFaer
Copy link

IlyaFaer commented Aug 1, 2019

@emar-kar, you should run black reformat on bigquery/dataset.py file to get lint session OK.
Also there are not covered lines of code:
google/cloud/bigquery/dataset.py 309, 306->309
See:
https://source.cloud.google.com/results/invocations/5834d3bb-e5dc-4f3f-aeba-34ddf36780be/targets/cloud-devrel%2Fclient-libraries%2Fgoogle-cloud-python%2Fpresubmit%2Fbigquery/log

@IlyaFaer IlyaFaer added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 1, 2019
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 1, 2019
@IlyaFaer IlyaFaer requested a review from tswast August 2, 2019 07:47
@IlyaFaer IlyaFaer marked this pull request as ready for review August 2, 2019 07:47
@IlyaFaer IlyaFaer requested a review from a team August 2, 2019 07:47
bigquery/google/cloud/bigquery/dataset.py Outdated Show resolved Hide resolved
bigquery/google/cloud/bigquery/dataset.py Outdated Show resolved Hide resolved
bigquery/google/cloud/bigquery/dataset.py Show resolved Hide resolved
if with_prefix is None:
parts = dataset_id.split(".")
else:
parts = with_prefix.group("ref").split(".")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. What is this doing? I think the prefix needs to be part of the project ID, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the issue's Stack trace:
The error occurs due to the prefix google.com:. Previously the passed string was separated only by ., what led to ValueError raising because of the len(parts) > 2 at google.com:[project].ryan_dataset. As I see here prefix is not the part of the Project ID itself. I was trying to find out, how could I parse the string and fulfill both the previous and new format. That is why I decided to use regular expressions. Now, with template's help, it will separate prefix and solve several situations:

  • string-project.string_dataset - will pass the template successfully as it was before;
  • prefix:string-project.string_dataset - will group the part without prefix and then will divide it;
  • string-project:string_dataset - if the default_project was not defined raises ValueError;
  • google.com:project:dataset_id - same as above.

ValueError: Too many parts in dataset_id. Expected a fully-qualified dataset ID in standard SQL format. e.g. "project.dataset_id", got google.com:[project].ryan_dataset

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but it appears to me that we're discarding the prefix? Is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is what I was thinking before this conversation. So, now I'm a bit confused. I thought the prefix is an extra part and should be just removed. But if it is actually the part of the Project ID, I'll need to reconfigure the pattern.

Applying requested chgs.
// Removed description for 'single prefix'.
@tswast tswast self-requested a review August 6, 2019 00:27
@@ -26,6 +27,14 @@
from google.cloud.bigquery.table import TableReference


_W_PREFIX = re.compile(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pick a better name for this? Maybe _PROJECT_PREFIX_PATTERN?

@@ -26,6 +27,14 @@
from google.cloud.bigquery.table import TableReference


_W_PREFIX = re.compile(
r"""
(\S*)\:(?P<ref>\S*)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since at least one character is required, this should probably be \S+, right?

Also, ref isn't all that meaningful to me. How about remaining, since it's everything after the : character?

if with_prefix is None:
parts = dataset_id.split(".")
else:
parts = with_prefix.group("ref").split(".")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but it appears to me that we're discarding the prefix? Is that correct?

def test_from_string_w_prefix(self):
cls = self._get_target_class()
got = cls.from_string("prefix:string-project.string_dataset")
self.assertEqual(got.project, "string-project")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be prefix:string-project, since the prefix is actually part of the project ID?

Complete template change.
@AVaksman AVaksman added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 8, 2019
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 8, 2019
@@ -26,6 +27,14 @@
from google.cloud.bigquery.table import TableReference


_PROJECT_PREFIX_PATTERN = re.compile(
r"""
(?P<prefix>\S+\:\S+)\.+(?P<remaining>\S*)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we're matching this way, prefix isn't the right term. Should be project_id. Likewise, remaining should be renamed to dataset_id.

Also, instead of \S, we should be matching for characters other than ., that is [^.]+.

We want to match the whole string, so we should probably end this pattern with $.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed parts of the pattern, but the second comment about [^.] seems inappropriate to me. As we know the string could be google.com:project.dataset, that means that dot could be a part of the prefix. I checked couple of variants and as I see \S fits more.

@@ -26,6 +27,14 @@
from google.cloud.bigquery.table import TableReference


_PROJECT_PREFIX_PATTERN = re.compile(
r"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using a multi-line string here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for the readability. I think I’ll switch this to the single line, after correcting the pattern implementation.

minor corrections
@tseaver tseaver added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 12, 2019
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 12, 2019
def test_from_string_legacy_string(self):
cls = self._get_target_class()
with self.assertRaises(ValueError):
cls.from_string("string-project:string_dataset")

def test_from_string_w_incorrect_prefix(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add an additional test where the project ID / dataset ID contains an illegal . character. Another way to say that, is the string contains too many "parts". e.g. google.com:project-id.dataset_id.table_id. This should also fail with ValueError.

@@ -26,6 +27,9 @@
from google.cloud.bigquery.table import TableReference


_PROJECT_PREFIX_PATTERN = re.compile(r"(?P<project_id>\S+\:\S+)\.+(?P<dataset_id>\S+)$")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will match patterns with too many . characters. Let's try something like:

(?P<project_id>\S+\:[^.]+)\.(?P<dataset_id>[^.]+)$

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean, sorry for misunderstanding. Appreciate your help.

pattern rewrote with the '[^.]' and .VERBOSE (due to blacken session)
added test to check extra parts within the string with the prefix
reconf prefix in an existed test
@IlyaFaer IlyaFaer added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 15, 2019
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 15, 2019
@IlyaFaer IlyaFaer merged commit 2ab105b into googleapis:master Aug 22, 2019
@emar-kar emar-kar deleted the adding-support-2-project_ids-w-org-prefix branch August 26, 2019 11:36
HemangChothani pushed a commit to HemangChothani/google-cloud-python that referenced this pull request Aug 29, 2019
emar-kar added a commit to MaxxleLLC/google-cloud-python that referenced this pull request Sep 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BigQuery: Add support to Dataset for project_ids with org prefix.
7 participants