Skip to content

Commit

Permalink
Added qubole mode
Browse files Browse the repository at this point in the history
fixed issues

minor change

fixed doc


Changed qubole conda path

updated doc
  • Loading branch information
AgrawalAmey committed Jan 13, 2020
1 parent 6e60e79 commit b99f2f4
Show file tree
Hide file tree
Showing 7 changed files with 428 additions and 3 deletions.
11 changes: 11 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -95,3 +95,14 @@ example/tutorial/R/*.nb.html

# travis_wait command logs
travis_wait*.log

.history

mlflow/java/scoring/bin/*
mlflow/java/client/bin/*
mlflow/java/.settings/*
mlflow/java/client/.classpath
mlflow/java/scoring/.classpath
mlflow/java/scoring/.settings/
mlflow/java/client/.settings/

79 changes: 79 additions & 0 deletions examples/qubole/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Running MLFlow in Qubole Mode


When run in `"qubole"` mode, a `ShellCommand` is launched on QDS from the MLFlow project.

## Setting up cluster

Install `mlflow` package on cluster using the node-bootstrap.

```
source /usr/lib/environs/a-2019.03-py-3.7.3/bin/activate /usr/lib/environs/a-2019.03-py-3.7.3/
conda install mlflow
```

## Start tracking server

To run a long-lived, shared MLflow tracking server, launch an EC2 instance to run the MLflow Tracking server.

Create an Anaconda with Python 3 AMI EC2 instance.You can use a t2.micro (Free-tier) instance for test environment. This AMI already has conda and many other packages needed pre-installed.
Install mlflow:
```
wget https://github.com/qubole/mlflow/releases/download/v1.5.0-q/mlflow-1.5.0-py3-none-any.whl
pip install mlflow-1.5.0-py3-none-any.whl
```
Open port 5000 for MLflow server; an example of how to do this via How to open a web server port on EC2 instance. Opening up port 5000 to the Internet will allow anyone to access your server, so it is recommended to only open up the port within an AWS VPC that your Qubole clusters have access to.
Configure your AWS credentials on the instance. The optimal configuration for MLflow Remote Tracking is to use the default-artifact-root option to store your artifacts in an S3 bucket.
SSH into your EC2 instance, e.g. ssh -i ~/.ssh/<key>.pem ubuntu@<hostname>.<region>.compute.amazonaws.com.
Configure your S3 credentials via aws cli; for more information, refer to Configuring the AWS CLI.
Run the Tracking Server
Start the tracking server:
```sh
mlflow server --default-artifact-root s3://<bucket-name> --host 0.0.0.0.
```
For more information, refer to MLflow > Running a Tracking Server.
Test connectivity of your tracking server. Go to http://<mlflow-server-dns>:5000; it should look similar to

![](https://docs.databricks.com/_static/images/mlflow/mlflow-web-ui.png)

## Run the job

### Set tracking server variable

Set environment variable `MLFLOW_TRACKING_URI`.

### Create cluster spec file
Running the remote job requires `backend-spec.json` to be passed as follows,

```json
{
"aws": {
"s3_experiment_bucket": "<bucket-name>",
"s3_experiment_base_path": "<directory>"
},
"qubole": {
"api_token": "<qubole-api-token>" ,
"api_url": "https://api.qubole.com/api/",
"version": "v1.2",
"poll_interval": 5,
"skip_ssl_cert_check": false,
"cloud_name": "AWS"
},
"cluster": {
"label": "mlflow-test"
},
"command": {
"name": "mlflow-test",
"tags": ["mlflow"],
"notify": false
}
}
```

### Example

A toy example can be launch using the following command,

```sh
mlflow run git@github.com:agrawalamey/mlflow-example.git -P alpha=0.5 -b qubole --backend-config backend-spec.json
```
15 changes: 12 additions & 3 deletions mlflow/projects/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
import docker

import mlflow.projects.databricks
import mlflow.projects.qubole
import mlflow.tracking as tracking
import mlflow.tracking.fluent as fluent
from mlflow.entities import RunStatus, SourceType
Expand Down Expand Up @@ -201,8 +202,15 @@ def _run(uri, experiment_id, entry_point="main", version=None, parameters=None,
kube_config['kube-job-template']
)
return submitted_run

supported_backends = ["local", "databricks", "kubernetes"]
elif backend == "qubole":
tracking.MlflowClient().set_tag(active_run.info.run_id, MLFLOW_PROJECT_BACKEND,
"qubole")
from mlflow.projects.qubole import run_qubole
return run_qubole(
remote_run=active_run,
uri=uri, entry_point=entry_point, work_dir=work_dir, parameters=parameters,
experiment_id=experiment_id, cluster_spec=backend_config)
supported_backends = ["local", "databricks", "kubernetes", "qubole"]
raise ExecutionException("Got unsupported execution mode %s. Supported "
"values: %s" % (backend, supported_backends))

Expand Down Expand Up @@ -276,7 +284,8 @@ def run(uri, entry_point="main", version=None, parameters=None,

if backend == "databricks":
mlflow.projects.databricks.before_run_validations(mlflow.get_tracking_uri(), backend_config)

elif backend == "qubole":
mlflow.projects.qubole.before_run_validations(mlflow.get_tracking_uri(), cluster_spec_dict)
experiment_id = _resolve_experiment_id(experiment_name=experiment_name,
experiment_id=experiment_id)

Expand Down
Loading

0 comments on commit b99f2f4

Please sign in to comment.