Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load data into databricks without external staging and auth. #2166

Closed
rudolfix opened this issue Dec 19, 2024 · 0 comments · Fixed by #2219
Closed

Load data into databricks without external staging and auth. #2166

rudolfix opened this issue Dec 19, 2024 · 0 comments · Fixed by #2219
Assignees
Labels
support This issue is monitored by Solution Engineer

Comments

@rudolfix
Copy link
Collaborator

It seems that python sdk for databricks allows to upload files.

  1. Research if it is possible to load files into tables like we do into BigQuery: when a local file may be copied into a table without any stage
  2. If that does not work, research how to use Volumes on databricks to copy files there and use COPY INTO to move them into table.
  3. If authentication is not configured, enable default credentials (ie. if present on serverless compute). You can take a look how CredentialsWithDefault is used (most implementations check if default credentials are present in def on_partial(self) -> None: but in your case you should to it in on_resolve when all fields holding credentials are empty)

Ideal scenario. when running in a Notebook, is that we can load a source (ie rest_api) without any additional configuration, staging or authorization - like we are able to do with duckdb

@rudolfix rudolfix moved this from Todo to In Progress in dlt core library Dec 19, 2024
@rudolfix rudolfix added the support This issue is monitored by Solution Engineer label Dec 19, 2024
rudolfix added a commit that referenced this issue Feb 1, 2025
* databricks: enable local files

* fix: databricks test config

* work in progress

* added create and drop volume to interface

* refactor direct load authentication

* fix databricks volume file name

* refactor databricks direct loading

* format and lint

* revert config.toml changes

* force notebook auth

* enhanced config validations

* force exception

* fix config resolve

* remove imports

* test: config exceptions

* restore comments

* restored destination_config

* fix pokema api values

* enables databricks no stage tests

* fix databricks config on_resolved

* adjusted direct load file management

* direct load docs

* filters by bucket when subset of destinations is set when creating test cases

* simpler file upload

* fix comment

* passes authentication directly from workspace, adds proper fingerprinting

* use real client_id in tests

* fixes config resolver to not pass NotResolved hints to config providers

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
@github-project-automation github-project-automation bot moved this from In Progress to Done in dlt core library Feb 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
support This issue is monitored by Solution Engineer
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants