Added qubole mode

fixed issues minor change fixed doc Changed qubole conda path updated doc
qubole · Jan 13, 2020 · b99f2f4 · b99f2f4
1 parent 6e60e79
commit b99f2f4
Show file tree

Hide file tree

Showing 7 changed files with 428 additions and 3 deletions.
diff --git a/.gitignore b/.gitignore
@@ -95,3 +95,14 @@ example/tutorial/R/*.nb.html
 
 # travis_wait command logs
 travis_wait*.log
+
+.history
+
+mlflow/java/scoring/bin/*
+mlflow/java/client/bin/*
+mlflow/java/.settings/*
+mlflow/java/client/.classpath
+mlflow/java/scoring/.classpath
+mlflow/java/scoring/.settings/
+mlflow/java/client/.settings/
+
diff --git a/examples/qubole/readme.md b/examples/qubole/readme.md
@@ -0,0 +1,79 @@
+# Running MLFlow in Qubole Mode
+
+
+When run in `"qubole"` mode, a `ShellCommand` is launched on QDS from the MLFlow project. 
+
+## Setting up cluster
+
+Install `mlflow` package on cluster using the node-bootstrap.
+
+```
+source /usr/lib/environs/a-2019.03-py-3.7.3/bin/activate /usr/lib/environs/a-2019.03-py-3.7.3/
+conda install mlflow
+```
+
+## Start tracking server
+
+To run a long-lived, shared MLflow tracking server, launch an EC2 instance to run the MLflow Tracking server.
+
+Create an Anaconda with Python 3 AMI EC2 instance.You can use a t2.micro (Free-tier) instance for test environment. This AMI already has conda and many other packages needed pre-installed.
+Install mlflow:
+```
+wget https://github.com/qubole/mlflow/releases/download/v1.5.0-q/mlflow-1.5.0-py3-none-any.whl
+pip install mlflow-1.5.0-py3-none-any.whl
+```
+Open port 5000 for MLflow server; an example of how to do this via How to open a web server port on EC2 instance. Opening up port 5000 to the Internet will allow anyone to access your server, so it is recommended to only open up the port within an AWS VPC that your Qubole clusters have access to.
+Configure your AWS credentials on the instance. The optimal configuration for MLflow Remote Tracking is to use the default-artifact-root option to store your artifacts in an S3 bucket.
+SSH into your EC2 instance, e.g. ssh -i ~/.ssh/<key>.pem ubuntu@<hostname>.<region>.compute.amazonaws.com.
+Configure your S3 credentials via aws cli; for more information, refer to Configuring the AWS CLI.
+Run the Tracking Server
+Start the tracking server: 
+```sh
+mlflow server --default-artifact-root s3://<bucket-name> --host 0.0.0.0.
+```
+For more information, refer to MLflow > Running a Tracking Server.
+Test connectivity of your tracking server. Go to http://<mlflow-server-dns>:5000; it should look similar to
+
+![](https://docs.databricks.com/_static/images/mlflow/mlflow-web-ui.png)
+
+## Run the job
+
+### Set tracking server variable
+
+Set environment variable `MLFLOW_TRACKING_URI`.
+
+### Create cluster spec file
+Running the remote job requires `backend-spec.json` to be passed as follows,
+
+```json
+{
+    "aws": {
+        "s3_experiment_bucket": "<bucket-name>",
+        "s3_experiment_base_path": "<directory>"
+    },
+    "qubole": {
+        "api_token": "<qubole-api-token>" ,
+        "api_url": "https://api.qubole.com/api/",
+        "version": "v1.2",
+        "poll_interval": 5,
+        "skip_ssl_cert_check": false,
+        "cloud_name": "AWS"
+    },
+    "cluster": {
+        "label": "mlflow-test"
+    },
+    "command": {
+        "name": "mlflow-test",
+        "tags": ["mlflow"],
+        "notify": false
+    }
+}
+```
+
+### Example
+
+A toy example can be launch using the following command,
+
+```sh
+mlflow run git@github.com:agrawalamey/mlflow-example.git -P alpha=0.5 -b qubole --backend-config backend-spec.json
+```
diff --git a/mlflow/projects/__init__.py b/mlflow/projects/__init__.py
@@ -19,6 +19,7 @@
 import docker
 
 import mlflow.projects.databricks
+import mlflow.projects.qubole
 import mlflow.tracking as tracking
 import mlflow.tracking.fluent as fluent
 from mlflow.entities import RunStatus, SourceType
@@ -201,8 +202,15 @@ def _run(uri, experiment_id, entry_point="main", version=None, parameters=None,
             kube_config['kube-job-template']
         )
         return submitted_run
-
-    supported_backends = ["local", "databricks", "kubernetes"]
+    elif backend == "qubole":
+        tracking.MlflowClient().set_tag(active_run.info.run_id, MLFLOW_PROJECT_BACKEND,
+                                "qubole")
+        from mlflow.projects.qubole import run_qubole
+        return run_qubole(
+            remote_run=active_run,
+            uri=uri, entry_point=entry_point, work_dir=work_dir, parameters=parameters,
+            experiment_id=experiment_id, cluster_spec=backend_config)
+    supported_backends = ["local", "databricks", "kubernetes", "qubole"]
     raise ExecutionException("Got unsupported execution mode %s. Supported "
                              "values: %s" % (backend, supported_backends))
 
@@ -276,7 +284,8 @@ def run(uri, entry_point="main", version=None, parameters=None,
 
     if backend == "databricks":
         mlflow.projects.databricks.before_run_validations(mlflow.get_tracking_uri(), backend_config)
-
+    elif backend == "qubole":
+        mlflow.projects.qubole.before_run_validations(mlflow.get_tracking_uri(), cluster_spec_dict)
     experiment_id = _resolve_experiment_id(experiment_name=experiment_name,
                                            experiment_id=experiment_id)