-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIR, callbacks] _MLflowLoggerUtil incompatible with DB MLflow backend #29749
Labels
bug
Something that is supposed to be working; but isn't
observability
Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling
P1
Issue that should be fixed within a few weeks
Comments
Was able to reproduce with a
Debugging now. |
31 tasks
Figured it out -- it's a bug with MLflow @tbukic As a temporary workaround, installing the MLflow nightly should fix your issue. @amogkam should we add a fix to our code like below?
|
thanks for looking into this @bveeramani. yes that change sgtm |
7 tasks
Great job, @bveeramani , thank you! |
amogkam
pushed a commit
that referenced
this issue
Nov 1, 2022
WeichenXu123
pushed a commit
to WeichenXu123/ray
that referenced
this issue
Dec 19, 2022
See ray-project#29749. Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
bug
Something that is supposed to be working; but isn't
observability
Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling
P1
Issue that should be fixed within a few weeks
What happened + What you expected to happen
Line 199 in mlflow integration internals seems to be making problems when using mlflow in the scenario 5.
Call to mlflow API gives an error with text:
Skipping asigning mlflow.runName tag seems to allow creation of the experiment. Without dwelling into mlflow code, my best guess (based on MLflow documentation) is MLflow automatically creates mlflow.runName column in the database and it fails on unique key constraint.
Commenting out the line 199 solves the issue locally. I guess it can be solved in production by changing Docker image, but ideally it is one liner fix to do and release.
Versions / Dependencies
Runtime environment for the example is local ray instance on Ubuntu-22.04 on WSL2.
MLflow tracking server is deployed on private cloud, running 1.30.0, reporting to PosgreSQL database and uses S3 object storage for artifacts.
Which on my PC resolves to:
Reproduction script
Any code can be used when you have proper MLflow setup. Unfortunately mine is deployed on company's VPN and I can't share it. We're using setup #5, but I'm pretty sure any setup with database back-end will fail. Setups which don't use db but local filesystem work fine because there is no primary key constraint on
Bellow is a simplified version of the code I used for debugging purposes, but other examples fail as well. E.g. this.
Issue Severity
Medium: It is a significant difficulty but I can work around it.
The text was updated successfully, but these errors were encountered: