Consistently use the `upload(path, IO)` and `download(path) -> IO` across file-related operations #148

nfx · 2023-06-02T20:32:51Z

Changes

added w.workspace.upload & w.workspace.download
added w.dbfs.upload & w.dbfs.download
added w.files.upload & w.files.download
modified low-level client to work with raw streams and debug messages correctly

Fix #104

Tests

new integration tests

TODO: - more type safety - remove `direct_download` from ExportRequest Fix #104

codecov-commenter · 2023-06-02T20:40:10Z

Codecov Report

Patch coverage: 56.00% and project coverage change: +0.04 🎉

Comparison is base (136d7e1) 53.18% compared to head (2618e3c) 53.23%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #148      +/-   ##
==========================================
+ Coverage   53.18%   53.23%   +0.04%     
==========================================
  Files          29       30       +1     
  Lines       17900    17956      +56     
==========================================
+ Hits         9521     9558      +37     
- Misses       8379     8398      +19

Impacted Files	Coverage Δ
databricks/sdk/service/sql.py	`55.41% <ø> (ø)`
databricks/sdk/mixins/workspace.py	`44.44% <44.44%> (ø)`
databricks/sdk/core.py	`67.33% <45.45%> (+0.61%)`	⬆️
databricks/sdk/mixins/files.py	`75.09% <62.50%> (ø)`
databricks/sdk/__init__.py	`73.14% <100.00%> (+0.50%)`	⬆️
databricks/sdk/dbutils.py	`79.67% <100.00%> (ø)`

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

…binaryIO` across DBFS and Workspace APIs

jerryjam-db · 2023-06-07T20:19:49Z

databricks/sdk/mixins/workspace.py

+               path: str,
+               content: typing.BinaryIO,
+               *,
+               format: ExportFormat = ExportFormat.AUTO,


This technically has different semantics than the Workspace API, which when the format is not specified then it will default to format: AUTO.

Not opposed to this, but just want to make sure that this is a conscious choice.

@jerryjam-db i still have to add doc string for this, but is it okay to mention that: "if you specify a file extension, format will be determined automatically, otherwise use format=AUTO and language=PYTHON to create a notebook"?

@jerryjam-db i've added path extension checking in the SDK, so that by default we can have this simple code to import the notebook without having to pass in the language parameter.

py = f'/Users/{w.current_user.me().user_name}/notebook-{random(12)}.py' w.workspace.upload(py, io.BytesIO(b'print(1)')) with w.workspace.download(py) as f: content = f.read() assert content == b'# Databricks notebook source\nprint(1)'

files are done via:

py_file = f'/Users/{w.current_user.me().user_name}/file-{random(12)}.py' w.workspace.upload(py_file, io.BytesIO(b'print(1)'), format=ExportFormat.AUTO) with w.workspace.download(py_file) as f: content = f.read() assert content == b'print(1)' w.workspace.delete(py_file)

jerryjam-db · 2023-06-07T20:21:10Z

databricks/sdk/mixins/workspace.py

Do we include some information in the User Agent to indicate that these requests are coming from the Databricks SDK? This will help for tracking on our side.

yes, let me share you the spec

pietern · 2023-06-08T07:23:44Z

databricks/sdk/core.py

        sb.append(f'< {response.status_code} {response.reason}')
-        if response.content:
+        if raw and 'Content-Type' in response.headers:


Why check for Content-Type header here?

Raw streams with Transfer-Encoding: chunked do not have Content-Type header

databricks/sdk/mixins/files.py

databricks/sdk/core.py

pietern · 2023-06-08T07:31:13Z

databricks/sdk/mixins/workspace.py

+        data = {'path': path, 'format': format.value}
+        if language:
+            data['language'] = language.value
+        return self._api.do('POST', '/api/2.0/workspace/import', files={'content': content}, data=data)


The data kwarg in requests appears to be for form data, not JSON body.

What is the resulting request content type here?

Separate question: does this same API work for uploading regular files? I notice we use the /workspace-files//import-file API elsewhere to import regular files.

this is roughly equivalent of

as per request of @kahing and @jerryjam-db to do less base64 on both client and control plane side

mgyucht · 2023-06-08T07:28:49Z

databricks/sdk/mixins/files.py

+        self._api.do('PUT', f'/api/2.0/fs/files{path}', data=src) # files for the workspace upload
+
+    def download(self, path: str) -> BinaryIO:
+        return self._api.do('GET', f'/api/2.0/fs/files{path}', raw=True)


Is path always absolute?

mgyucht · 2023-06-08T07:38:32Z

databricks/sdk/mixins/files.py

+        self._api = api_client
+
+    def upload(self, path: str, src: BinaryIO):
+        self._api.do('PUT', f'/api/2.0/fs/files{path}', data=src) # files for the workspace upload


These endpoints don't appear yet in the open API spec, is that intended?

@bogdanghita-db didn't create one yet

mgyucht · 2023-06-08T07:38:54Z

.codegen/__init__.py.tmpl

@@ -34,6 +35,7 @@ class WorkspaceClient:
        self.config = config
        self.dbutils = dbutils.RemoteDbUtils(self.config)
        self.api_client = client.ApiClient(self.config)
+        self.files = FilesMixin(self.api_client)


Does this need to be added here explicitly because there is no Files tag in the OpenAPI spec?

yes, temporarily

mgyucht

Few questions. Nice to unify these things together.

# Version changelog ## 0.1.9 * Added new services from OpenAPI spec ([#145](#145), [#159](#159)). * Added consistent usage of the `upload(path, IO)` and `download(path) -> IO` across file-related operations ([#148](#148)). * Added Databricks Metadata Service credential provider ([#139](#139), [#130](#130)). * Added exposing runtime credential provider without changing user namespace ([#140](#140)). * Added a check for `is not None` for primitive fields in `as_dict()` ([#147](#147)). * Fixed bug related to boolean flags and convert `True` to `true` in query strings ([#156](#156)). * Fixed generation of external entities ([#146](#146)). * Make u2m authentication work with new CLI ([#150](#150)).

Added w.workspace.direct_download method

dbb7e31

TODO: - more type safety - remove `direct_download` from ExportRequest Fix #104

Consistently use the upload(path, binaryIO) and `download(path) -> …

2618e3c

…binaryIO` across DBFS and Workspace APIs

nfx changed the title ~~Added w.workspace.direct_download method~~ Consistently use the upload(path, IO) and download(path) -> IO across file-related operations Jun 7, 2023

jerryjam-db approved these changes Jun 7, 2023

View reviewed changes

pietern reviewed Jun 8, 2023

View reviewed changes

mgyucht reviewed Jun 8, 2023

View reviewed changes

added more tests

f5fd31f

nfx merged commit 087cf3f into main Jun 8, 2023

nfx deleted the fix/104 branch June 8, 2023 12:27

nfx mentioned this pull request Jun 9, 2023

Release v0.1.9 #160

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consistently use the `upload(path, IO)` and `download(path) -> IO` across file-related operations #148

Consistently use the `upload(path, IO)` and `download(path) -> IO` across file-related operations #148

nfx commented Jun 2, 2023 •

edited

Loading

codecov-commenter commented Jun 2, 2023 •

edited

Loading

jerryjam-db Jun 7, 2023

nfx Jun 7, 2023

nfx Jun 8, 2023

jerryjam-db Jun 7, 2023

nfx Jun 7, 2023

pietern Jun 8, 2023

nfx Jun 8, 2023

pietern Jun 8, 2023

pietern Jun 8, 2023

nfx Jun 8, 2023 •

edited

Loading

mgyucht Jun 8, 2023

nfx Jun 8, 2023

mgyucht Jun 8, 2023

nfx Jun 8, 2023

mgyucht Jun 8, 2023

nfx Jun 8, 2023

mgyucht left a comment

Consistently use the upload(path, IO) and download(path) -> IO across file-related operations #148

Consistently use the upload(path, IO) and download(path) -> IO across file-related operations #148

Conversation

nfx commented Jun 2, 2023 • edited Loading

Changes

Tests

codecov-commenter commented Jun 2, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nfx Jun 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mgyucht left a comment

Choose a reason for hiding this comment

Consistently use the `upload(path, IO)` and `download(path) -> IO` across file-related operations #148

Consistently use the `upload(path, IO)` and `download(path) -> IO` across file-related operations #148

nfx commented Jun 2, 2023 •

edited

Loading

codecov-commenter commented Jun 2, 2023 •

edited

Loading

nfx Jun 8, 2023 •

edited

Loading