Phase 1 for storing schemas for later use. #7761

lbristol88 · 2019-04-20T00:16:58Z

Part of feature request: #3419

tswast · 2019-04-22T16:07:12Z

bigquery/google/cloud/bigquery/client.py

+            file_obj = file_or_path
+        else:
+            try:
+                file_obj = open(file_or_path)


An important difference with this path is that we must close file_obj if we open it based on a path, but we must not close file_obj if we are given a file-like object.

I recommend using a with statement when we are provided a path to make sure we always close it. You may want to refactor this to add a _schema_from_json_file_object() private helper method that does the loading part without the file closing part.

Added helper method per recommendation along with the statement to close the file.

tswast · 2019-04-22T16:07:53Z

bigquery/google/cloud/bigquery/client.py

+            try:
+                file_obj = open(file_or_path)
+            except OSError:
+                raise TypeError(_NEED_JSON_FILE_ARGUMENT)


In general we use ValueError for unexpected input argument.

Updated with correct exception.

tswast · 2019-04-22T16:08:25Z

bigquery/google/cloud/bigquery/client.py

+        try:
+            json_data = json.load(file_obj)
+        except JSONDecodeError:
+            raise TypeError(_NEED_JSON_FILE_ARGUMENT)


I'd prefer if we let this raise (don't catch it). That way the user doesn't lose context about what is wrong with the input file.

Took this piece out.

tswast · 2019-04-22T16:10:23Z

bigquery/google/cloud/bigquery/client.py

+            file_obj = destination
+        else:
+            try:
+                file_obj = open(destination, mode="w")


Just as in schema_from_json, we must close file_obj if we open it based on a path, but we must not close file_obj if we are given a file-like object.

Updated with with statement to close when appropriate.

tswast · 2019-04-22T16:12:42Z

bigquery/tests/unit/test_client.py

+          }
+        ]"""
+        expected = list()
+        json_data = json.loads(file_content)


You're basically repeating the function definition here. That's not ideal. You've basically written a change-detector test. I'd prefer to see actual schema.SchemaField constructors in tests.

I do like that you've mocked out open and the file contents, so keep that. Just change how you construct expected.

Made changes to include schema.SchemaField as requested.

tswast · 2019-04-22T16:15:33Z

bigquery/tests/unit/test_client.py

@@ -5161,3 +5161,84 @@ def test__do_multipart_upload_wrong_size(self):

        with pytest.raises(ValueError):
            client._do_multipart_upload(file_obj, {}, file_obj_len + 1, None)
+
+    def test__schema_from_json(self):


Nit: since the function name is schema_from_json, the test name should be test_schema_from_json (just one underscore between test and schema.)

Ditto for schema_to_json test function name.

Updated test names appropriately.

bigquery/tests/unit/test_client.py

tswast · 2019-04-23T20:09:18Z

bigquery/google/cloud/bigquery/client.py

+
+    def schema_from_json(self, file_or_path):
+        """Takes a file object or file path that contains json that describes
+            a table schema.


Nit: This and the other docstrings (except for the first line) look a like they are indented 4 spaces too many.

Fixed indentations.

tswast · 2019-04-23T20:10:01Z

bigquery/google/cloud/bigquery/client.py

@@ -1929,6 +1935,61 @@ def list_rows(
        )
        return row_iterator

+    def _schema_from_json_file_object(self, file):


Nit: since file is a built-in, use file_ or file_obj as the name.

Changed name as directed.

tswast · 2019-04-23T20:11:14Z

bigquery/google/cloud/bigquery/client.py

+        schema_field_list = list()
+        json_data = json.load(file)
+
+        for field in json_data:


FYI: This loop could be replaced with a Python "list comprehension".

Changed to be a list comprehension.

tswast · 2019-04-23T20:11:45Z

bigquery/google/cloud/bigquery/client.py

+            return self._schema_from_json_file_object(file_or_path)
+        else:
+            try:
+                with open(file_or_path) as file:


Nit: file should be file_obj since file is a built-in.

Updated name accordingly.

tswast · 2019-04-23T20:12:05Z

bigquery/google/cloud/bigquery/client.py

+        """
+        if isinstance(file_or_path, io.IOBase):
+            return self._schema_from_json_file_object(file_or_path)
+        else:


Nit: Since the above line returns, no need for else.

Removed the else as recommended.

tswast · 2019-04-23T20:14:44Z

bigquery/google/cloud/bigquery/client.py

+            json_schema_list.append(schema_field)
+
+        if isinstance(destination, io.IOBase):
+            destination.write(json.dumps(json_schema_list, indent=2, sort_keys=True))


Use json.dump(schema_list, destination) instead. Then you can use BytesIO in the tests, too.

I'd prefer if json.dump wasn't repeated (here we do want to be DRY). Replace this with file_obj = destination and remove the else.

Refactored into a helper function because I couldn't get it to work with the context manager below just by removing the else.

tswast · 2019-04-23T20:16:33Z

bigquery/tests/unit/test_client.py

+        ]"""
+
+        expected = list()
+        expected.append(SchemaField("qtr", "STRING", "REQUIRED", "quarter"))


No need for .append. Instead, construct the list inline.

expected = [ SchemaField(...), SchemaField(...), ... ]

Switched to suggested method.

bigquery/tests/unit/test_client.py

tswast · 2019-04-23T20:17:58Z

bigquery/tests/unit/test_client.py

+            _mock_file().write.assert_called_once_with(file_content)
+            # This assert is to make sure __exit__ is called in the context
+            # manager that opens the file in the function
+            _mock_file().__exit__.assert_called_once_with(None, None, None)


We don't care about the actual arguments, so just assert_called_once will work.

Removed the arguments from the assert.

tswast · 2019-04-23T20:18:41Z

bigquery/tests/unit/test_client.py

+        open_patch = mock.patch("builtins.open", mock.mock_open())
+        with open_patch as _mock_file:
+            actual = client.schema_to_json(schema_list, mock_file_path)
+            _mock_file.assert_called_once_with(mock_file_path, mode="w")


Might need wb when using json.dump. (b for "binary" mode)

tswast · 2019-04-24T15:53:43Z

bigquery/google/cloud/bigquery/client.py

+            List of schema field objects.
+        """
+        json_data = json.load(file_obj)
+        return [SchemaField.from_api_repr(f) for f in json_data]


Wonderful!

One nit-pick, though. Style-wise in our client libraries and samples, we avoid single-letter variable names, even in list comprehensions. Let's rename f to field.

Variable name has been updated!

bigquery/tests/unit/test_client.py

tswast · 2019-04-24T16:26:53Z

bigquery/tests/unit/test_client.py

+
+        open_patch = mock.patch("builtins.open", mock.mock_open())
+        with open_patch as _mock_file:
+            with mock.patch("json.dump") as _mock_dump:


I like your thinking here. It's definitely nicer to compare lists than it is to compare JSON strings.

Nit: Let's write both with statement on one line. https://stackoverflow.com/a/1073814/101923

Nit: No need for leading underscore, that's usually to indicate a "private" variable, which isn't relevant inside a test.

with open_patch as mock_file, mock.patch("json.dump") as mock_dump:

Replace existing with statements with suggested line.

tswast · 2019-04-24T16:27:55Z

bigquery/tests/unit/test_client.py

+        client = self._make_client()
+
+        client.schema_to_json(schema_list, fake_file)
+        assert file_content == fake_file.getvalue()


Let's call json.loads(fake_file.getvalue()) and compare to a list of dictionaries like you do in the test of the path version.

The test has been updated with the list of dictionaries and updated assert.

tswast

Thanks!

tswast · 2019-04-24T19:49:30Z

bigquery/google/cloud/bigquery/client.py

+        try:
+            with open(file_or_path) as file_obj:
+                return self._schema_from_json_file_object(file_obj)
+        except OSError:


Oops, coverage is failing on this line and the similar line in the other function. We'd need a test where open() fails.

Honestly, I'd be okay removing the try block and letting these errors just raise, too.

Removed the try block as suggested for both functions.

tswast · 2019-04-24T21:28:37Z

bigquery/tests/unit/test_client.py

+        client = self._make_client()
+        mock_file_path = "/mocked/file.json"
+
+        open_patch = mock.patch("builtins.open", mock.mock_open())


From the test logs, it looks like this is tripping up Python 2. https://stackoverflow.com/a/34677735/101923

=================================== FAILURES =================================== ____________ TestClientUpload.test_schema_from_json_with_file_path _____________ self = <tests.unit.test_client.TestClientUpload object at 0x7f47e2dce4d0> def test_schema_from_json_with_file_path(self): from google.cloud.bigquery.schema import SchemaField file_content = """[ { "description": "quarter", "mode": "REQUIRED", "name": "qtr", "type": "STRING" }, { "description": "sales representative", "mode": "NULLABLE", "name": "rep", "type": "STRING" }, { "description": "total sales", "mode": "NULLABLE", "name": "sales", "type": "FLOAT" } ]""" expected = [ SchemaField("qtr", "STRING", "REQUIRED", "quarter"), SchemaField("rep", "STRING", "NULLABLE", "sales representative"), SchemaField("sales", "FLOAT", "NULLABLE", "total sales"), ] client = self._make_client() mock_file_path = "/mocked/file.json" open_patch = mock.patch( "builtins.open", new=mock.mock_open(read_data=file_content) ) > with open_patch as _mock_file: tests/unit/test_client.py:5202: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ .nox/unit-2-7/lib/python2.7/site-packages/mock/mock.py:1353: in __enter__ self.target = self.getter() .nox/unit-2-7/lib/python2.7/site-packages/mock/mock.py:1523: in <lambda> getter = lambda: _importer(target) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ target = 'builtins' def _importer(target): components = target.split('.') import_path = components.pop(0) > thing = __import__(import_path) E ImportError: No module named builtins

Remember you can run against Python 2 by using nox, locally.

We could check the Python version with sys.version_info.major and change what object you are patching.

Actually six.PY2 is probably a better way to detect. https://pythonhosted.org/six/

Fixed all the issues and the code is compatible with python 2 now.

lbristol88 added 2 commits April 19, 2019 17:10

Added functions to client for loading and saving schemas to a file.

9d3198b

Tests for schema to/from json.

894bb26

lbristol88 requested a review from crwilcox as a code owner April 20, 2019 00:16

googlebot added the cla: yes This human has signed the Contributor License Agreement. label Apr 20, 2019

tswast self-requested a review April 22, 2019 16:03

tswast requested changes Apr 22, 2019

View reviewed changes

Updated functions to close file and made test changes per feedback.

136cfcb

tswast added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 23, 2019

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 23, 2019

tswast requested changes Apr 23, 2019

View reviewed changes

lbristol88 added 3 commits April 23, 2019 22:07

Update with review feedback.

092e121

Removed unneeded variable

79e965b

Removed append in test

22bf1ab

tswast added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 24, 2019

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 24, 2019

tswast reviewed Apr 24, 2019

View reviewed changes

lbristol88 added 2 commits April 24, 2019 10:28

Made changes based on feedback

bb2ca79

Added change to test per feedback.

5451852

lbristol88 requested a review from a team April 24, 2019 17:36

tswast approved these changes Apr 24, 2019

View reviewed changes

tswast added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 24, 2019

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 24, 2019

tswast reviewed Apr 24, 2019

View reviewed changes

removed try blocks per suggestion

f4ae0c1

tswast added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 24, 2019

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 24, 2019

tswast reviewed Apr 24, 2019

View reviewed changes

Updated test to pass python 2 version

5791049

tswast approved these changes Apr 24, 2019

View reviewed changes

tswast added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 24, 2019

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 24, 2019

fixed tests to work in python 2

fa331c4

tswast added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 24, 2019

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 24, 2019

tswast merged commit 55a8097 into googleapis:master Apr 25, 2019

tswast mentioned this pull request Nov 12, 2019

How to store BigQuery schemas for later use. #3419

Closed

Phase 1 for storing schemas for later use. #7761

Phase 1 for storing schemas for later use. #7761

Conversation

lbristol88 commented Apr 20, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tswast left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment