-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consistently use the upload(path, IO)
and download(path) -> IO
across file-related operations
#148
Conversation
TODO: - more type safety - remove `direct_download` from ExportRequest Fix #104
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #148 +/- ##
==========================================
+ Coverage 53.18% 53.23% +0.04%
==========================================
Files 29 30 +1
Lines 17900 17956 +56
==========================================
+ Hits 9521 9558 +37
- Misses 8379 8398 +19
☔ View full report in Codecov by Sentry. |
…binaryIO` across DBFS and Workspace APIs
w.workspace.direct_download
methodupload(path, IO)
and download(path) -> IO
across file-related operations
databricks/sdk/mixins/workspace.py
Outdated
path: str, | ||
content: typing.BinaryIO, | ||
*, | ||
format: ExportFormat = ExportFormat.AUTO, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This technically has different semantics than the Workspace API, which when the format is not specified then it will default to format: AUTO
.
Not opposed to this, but just want to make sure that this is a conscious choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jerryjam-db i still have to add doc string for this, but is it okay to mention that: "if you specify a file extension, format will be determined automatically, otherwise use format=AUTO and language=PYTHON to create a notebook"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jerryjam-db i've added path extension checking in the SDK, so that by default we can have this simple code to import the notebook without having to pass in the language
parameter.
py = f'/Users/{w.current_user.me().user_name}/notebook-{random(12)}.py'
w.workspace.upload(py, io.BytesIO(b'print(1)'))
with w.workspace.download(py) as f:
content = f.read()
assert content == b'# Databricks notebook source\nprint(1)'
files are done via:
py_file = f'/Users/{w.current_user.me().user_name}/file-{random(12)}.py'
w.workspace.upload(py_file, io.BytesIO(b'print(1)'), format=ExportFormat.AUTO)
with w.workspace.download(py_file) as f:
content = f.read()
assert content == b'print(1)'
w.workspace.delete(py_file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we include some information in the User Agent to indicate that these requests are coming from the Databricks SDK? This will help for tracking on our side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, let me share you the spec
databricks/sdk/core.py
Outdated
sb.append(f'< {response.status_code} {response.reason}') | ||
if response.content: | ||
if raw and 'Content-Type' in response.headers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why check for Content-Type
header here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raw streams with Transfer-Encoding: chunked
do not have Content-Type
header
databricks/sdk/mixins/workspace.py
Outdated
data = {'path': path, 'format': format.value} | ||
if language: | ||
data['language'] = language.value | ||
return self._api.do('POST', '/api/2.0/workspace/import', files={'content': content}, data=data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data
kwarg in requests appears to be for form data, not JSON body.
What is the resulting request content type here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate question: does this same API work for uploading regular files? I notice we use the /workspace-files//import-file
API elsewhere to import regular files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as per request of @kahing and @jerryjam-db to do less base64 on both client and control plane side
self._api.do('PUT', f'/api/2.0/fs/files{path}', data=src) # files for the workspace upload | ||
|
||
def download(self, path: str) -> BinaryIO: | ||
return self._api.do('GET', f'/api/2.0/fs/files{path}', raw=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is path always absolute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
databricks/sdk/mixins/files.py
Outdated
self._api = api_client | ||
|
||
def upload(self, path: str, src: BinaryIO): | ||
self._api.do('PUT', f'/api/2.0/fs/files{path}', data=src) # files for the workspace upload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These endpoints don't appear yet in the open API spec, is that intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bogdanghita-db didn't create one yet
@@ -34,6 +35,7 @@ class WorkspaceClient: | |||
self.config = config | |||
self.dbutils = dbutils.RemoteDbUtils(self.config) | |||
self.api_client = client.ApiClient(self.config) | |||
self.files = FilesMixin(self.api_client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be added here explicitly because there is no Files tag in the OpenAPI spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, temporarily
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few questions. Nice to unify these things together.
# Version changelog ## 0.1.9 * Added new services from OpenAPI spec ([#145](#145), [#159](#159)). * Added consistent usage of the `upload(path, IO)` and `download(path) -> IO` across file-related operations ([#148](#148)). * Added Databricks Metadata Service credential provider ([#139](#139), [#130](#130)). * Added exposing runtime credential provider without changing user namespace ([#140](#140)). * Added a check for `is not None` for primitive fields in `as_dict()` ([#147](#147)). * Fixed bug related to boolean flags and convert `True` to `true` in query strings ([#156](#156)). * Fixed generation of external entities ([#146](#146)). * Make u2m authentication work with new CLI ([#150](#150)).
Changes
w.workspace.upload
&w.workspace.download
w.dbfs.upload
&w.dbfs.download
w.files.upload
&w.files.download
Fix #104
Tests
new integration tests