Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow tracking to databricks URI #248

Closed
nblumoe opened this issue Oct 6, 2021 · 2 comments · Fixed by #250
Closed

Allow tracking to databricks URI #248

nblumoe opened this issue Oct 6, 2021 · 2 comments · Fixed by #250
Assignees
Labels
bug Something isn't working

Comments

@nblumoe
Copy link

nblumoe commented Oct 6, 2021

Description

We would like to track to a Databricks managed MLflow tracking server. This should be possible by setting the mlflow_tracking_uri to databricks (see here)

kedro-mlfow will instead use a local, relative directory ./databricks to track the metrics.

Context

This would allow to integrate kedro-mlflow with the popular Databricks platform, more specifically the managed MLflow they offer.

Possible Implementation

I don't know enough about the implementation to give recommendations. Maybe there needs to be a conditional not to interpret databricks as a relative path as it seems to be a reserved word for the tracking URIs?

Possible Alternatives

As an alternative, it seems possible to use databricks://<PROFILE> as the tracking URI, but this requires to have such a profile in the first place.

@Galileo-Galilei
Copy link
Owner

Galileo-Galilei commented Oct 6, 2021

Hello @nblumoe, thank you very much for reporting this bug. Would you mind checking if this branch fixes it for you?

pip uninstall kedro-mlflow
pip install git+https://github.com/Galileo-Galilei/kedro-mlflow.git@bug/mtu-databricks

When you confirm it is ok, I'll deploy the bugfix to PyPI.

P.S.: databricks://<PROFILE> tracking uri should already work "as is" but you are right that we should support the exact same possibility as when one is setting the tracking uri manually.

@nblumoe
Copy link
Author

nblumoe commented Oct 7, 2021

I can confirm that it works with a config like this:

# mlflow.yml
mlflow_tracking_uri: databricks
credentials: mlflow_credentials 

# credentials.yml
mlflow_credentials:
  DATABRICKS_HOST: https://<MY_HOST>.cloud.databricks.com/
  DATABRICKS_TOKEN: <MY_TOKEN>

It also works with DATABRICKS_USERNAME, DATABRICKS_PASSWORD and DATABRICKS_HOST as credentials, as indicated here: https://docs.databricks.com/applications/mlflow/access-hosted-tracking-server.html

And yes, the databricks://<PROFILE> approach already works without the fix! 👍

Thanks for the quick fix and the great work on this project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: ✅ Done
2 participants