Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ISSUE] Pytests fail as a result of missing packages which are databricks-sdk dependencies #360

Closed
fshaikh-caa opened this issue Sep 21, 2023 · 2 comments

Comments

@fshaikh-caa
Copy link

fshaikh-caa commented Sep 21, 2023

Description
When I run my github action to test my code, it installs the databricks-sdk successfully however it fails when I run my unit test, I have added the exceptions below.

  • All tests pass locally, no issues when using the repo locally either

Reproduction

  • Run on github runner with OS: ubuntu-22.04 and Python: 3.9
  • Package install is successful
  • Run pytest where package is imported

Expected behavior

  • Pass all tests
  • Not throw missing module error

Debug Logs
The SDK logs helpful debugging information when debug logging is enabled. Set the log level to debug by adding logging.basicConfig(level=logging.DEBUG) to your program, and include the logs here.

/opt/hostedtoolcache/Python/3.9.[18]/x64/lib/python3.9/site-packages/databricks/sdk/runtime/__init__.py:79: in <module>
    from dbruntime import UserNamespaceInitializer
E   ModuleNotFoundError: No module named 'dbruntime'

During handling of the above exception, another exception occurred:
/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/databricks/sdk/runtime/__init__.py:101: in <module>
    from .stub import *
/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/databricks/sdk/runtime/stub.py:1: in <module>
    from pyspark.sql.context import SQLContext
E   ModuleNotFoundError: No module named 'pyspark'

/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/databricks/sdk/core.py:518: in __init__
    self._init_auth()
/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/databricks/sdk/core.py:838: in _init_auth
    raise ValueError(f'{self._credentials_provider.auth_type()} auth: {e}') from e
E   ValueError: default auth: cannot configure default credentials

Other Information

  • OS: ubuntu-22.04
  • Version: [0.8.0, 0.9.0]
  • Python 3.9

Additional context

  • All my tests pass and I am able to run everything locally without having pyspark or dbruntime installed so not sure what is going on here
@fshaikh-caa
Copy link
Author

So I think I figured out what was happening

I believe the package requires you to have the two env variables set that are listed here: https://docs.databricks.com/en/dev-tools/auth.html#perform-databricks-personal-access-token-authentication

I had databricks cli already set up so it didn't matter for me when I was testing locally but my CI did not have them set up.

My work around was adding a try catch to the import statement, my python package will also be running in non -databricks environments and adding those variables as dependencies didn't make sense to me

@mgyucht
Copy link
Contributor

mgyucht commented Sep 22, 2023

Thanks for raising this @fshaikh-caa.

We definitely intend for people to be able to use the SDK without having the databricks CLI installed/setting up unnecessary environment variables. However, the constructor of WorkspaceClient attempts to authenticate to Databricks. If you don't specify any parameters, we follow an auth flow called "unified authentication" where we load configurations from the environment and then try to autodetect the auth method to use. If you are writing unit tests, you'll need to specify your own custom "credentials provider" which is a no-op. As an example, see our pytest setup.

When you're running your application, you will need to configure either environment variables, your .databrickscfg file, or specify the credentials directly. In that case, this error won't be thrown (or if it still is, you can open another issue).

@mgyucht mgyucht closed this as not planned Won't fix, can't repro, duplicate, stale Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants