Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make workspace client also return runtime dbutils when in dbr #210

Merged
merged 7 commits into from
Jul 13, 2023
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion .codegen/__init__.py.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,22 @@ from databricks.sdk.service.{{.Package.Name}} import {{.PascalName}}API{{end}}
{{- getOrDefault $mixins $genApi $genApi -}}
{{- end -}}

def _make_dbutils(config: client.Config):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure it's the intended behavior, as we want to migrate users away from dbutils to SDK. this PR makes people assume that dbutils will stay there forever

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. But there is a currently a huge functionality gap between the sdk implementation and dbutils on runtime.

  1. Having the 2 entrypoints (from databricks.sdk.runtime import * and workspace_client.dbutils) provide different implementations can be confusing for the user.
  2. Also, it doesn't make sense to use the command-exec api (for proxyutils) when we are already in the runtime. And there is functionality (such as widgets) which will necessarily work differently in runtime and sdk.

If we are ok with putting the responsibility on the user to understand what dbutils they are using (1.), we can address 2 by selectively proxying to the dbr dbutils when in dbr (instead of using command exec or different implementations for widgets).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiaochen-db @falaki - what do you guys think?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like we are reversing from the former direction where REPL should eventually adopt the SDK dbutils despite current function gaps to maintaining two versions of dbutils: the SDK one and the DBR one, and SDK one will fall back on DBR if it's used in a REPL.

I am okay with that given that the function gaps seem unresolvable in the near term. My another question is that why do you only update the dbutils from WorkspaceClient? Is there other way for a user to create dbutils.RemoteDbUtils(config)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We expose dbutils through 2 paths

  1. from databricks.sdk.runtime import *
  2. workspaceClient.dbutils

The first one already falls back to the dbr version of dbutils. WorkspaceClient does not.

Good point about dbutils.RemoteDbUtils(config). It is possible. It currently looks like this

from databricks.sdk import dbutils
dbutils.RemoteDbUtils(config).fs.ls(".")

We should rename this to _dbutils to make it clear that this is an internal implementation, and users are not expected to initialise it. Wdyt @nfx ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also please type hint the return type of this correctly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@judahrand this will already work with the changes you made. This will return a union of types, which will include the full typing information you added to sdk.runtime

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant type hinting in the code 🤷‍♂️ No reason not to properly type hint all Python code these days.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is still missing a return type annotation 😢

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@judahrand it is difficult to type this function, because we cannot know what type of dbutils to use (the dbr or the OSS one) until we actually try importing dbutils from dbr. So this function can't be typed until this function runs.

We do not want to mess with dbr typing.

# We try to directly check if we are in runtime, instead of
# trying to import from databricks.sdk.runtime. This is to prevent
# remote dbutils from being created without the config, which is both
# expensive (will need to check all credential providers) and can
# throw errors (when no env vars are set).
try:
from dbruntime import UserNamespaceInitializer
except ImportError:
return dbutils.RemoteDbUtils(config)

# We are in runtime, so we can use the runtime dbutils
from databricks.sdk.runtime import dbutils as runtime_dbutils
return runtime_dbutils


class WorkspaceClient:
def __init__(self, *{{range $args}}, {{.}}: str = None{{end}},
debug_truncate_bytes: int = None,
Expand All @@ -33,7 +49,7 @@ class WorkspaceClient:
product=product,
product_version=product_version)
self.config = config.copy()
self.dbutils = dbutils.RemoteDbUtils(self.config)
self.dbutils = _make_dbutils(self.config)
self.api_client = client.ApiClient(self.config)
self.files = FilesMixin(self.api_client)
{{- range .Services}}{{if not .IsAccounts}}
Expand Down
3 changes: 2 additions & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,6 @@
"tests"
],
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true
"python.testing.pytestEnabled": true,
"editor.formatOnSave": true
}
18 changes: 17 additions & 1 deletion databricks/sdk/__init__.py

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.