Skip to content

Commit

Permalink
Misc DFP Documentation & fixes (#368)
Browse files Browse the repository at this point in the history
* Fills in `examples/digital_fingerprinting/production/README.md` with a bit more detail
* Fix handling of command line enum values
* Add date window filtering to `DFPFileBatcherStage`, since this stage is already performing date calculations, and occurs prior to the downloading of any remote data. Previously this was a part of the `S3SourceStage`.
* Add a `--start_time` flag to `dfp_azure_pipeline.py` and `dfp_duo_pipeline.py` allowing an explicit time window.
* `morpheus_training` service in docker compose renamed to `morpheus_pipeline` to reflect that it is used for both training and inference
* Set nvidia runtime in docker compose yaml for users who don't have it as their default runtime.
* MLflow version restricted to <1.29 to avoid a bug (per @pdmack )
* Fix casing of class names in headings for Sphinx builds
* Spelling fixes in docs & help strings
* Fix launch script for `morpheus_pipeline`
* from-azure and from-duo added to CLI
* Remove usage of `only_new_batches` argument which no longer exists
* CI work-around for https://gitlab.com/karolinepauls/pytest-kafka/-/issues/10
* Update diagrams to reflect recent code changes
* Support `_` as a time separator in `iso_date_regex`
* Adds the following to `docs/source/developer_guide/guides/5_digital_fingerprinting.md`:
  * Section on how to define a new data source schema
  * Explanation of Starter & Production examples
  * docker-compose commands for building & starting services in the production example
  * Helm Chart info for production example
  * Fix headings for stage classes


Fixes #345

Authors:
  - David Gardner (https://github.com/dagardner-nv)
  - Pete MacKinnon (https://github.com/pdmack)
  - Eli Fajardo (https://github.com/efajardo-nv)

Approvers:
  - Eli Fajardo (https://github.com/efajardo-nv)
  - Devin Robison (https://github.com/drobison00)

URL: #368
  • Loading branch information
dagardner-nv authored Sep 29, 2022
1 parent 12c8b1d commit 3a2d9c0
Show file tree
Hide file tree
Showing 30 changed files with 777 additions and 193 deletions.
16 changes: 13 additions & 3 deletions docs/source/_static/omni-style.css
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,16 @@ h4
text-transform: uppercase;
}

h3 code
{
text-transform: none;
}

h4 code
{
text-transform: none;
}

/* Paragraph Formatting */

p
Expand Down Expand Up @@ -218,7 +228,7 @@ html.writer-html5 .rst-content table.docutils th>p

/* cell text */
html.writer-html5 .rst-content table.docutils td>p,
html.writer-html5 .rst-content table.docutils th>p
html.writer-html5 .rst-content table.docutils th>p
{
font-size: var(--body-font-size);
line-height: var(--body-line-height);
Expand All @@ -230,7 +240,7 @@ html.writer-html5 .rst-content table.docutils th>p
.rst-content table.field-list td p:first-child,
.wy-table th p:first-child,
.rst-content table.docutils th p:first-child,
.rst-content table.field-list th p:first-child
.rst-content table.field-list th p:first-child
{
margin-top: 0px;
}
Expand All @@ -241,7 +251,7 @@ html.writer-html5 .rst-content table.docutils th>p
.rst-content table.field-list td p:last-child,
.wy-table th p:last-child,
.rst-content table.docutils th p:last-child,
.rst-content table.field-list th p:last-child
.rst-content table.field-list th p:last-child
{
margin-bottom: 0px;
}
Expand Down
440 changes: 363 additions & 77 deletions docs/source/developer_guide/guides/5_digital_fingerprinting.md

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions docs/source/developer_guide/guides/img/dfp_input_config.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions docs/source/developer_guide/guides/img/dfp_output_config.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 4 additions & 2 deletions examples/digital_fingerprinting/production/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,10 @@ FROM base as jupyter
RUN source activate morpheus \
&& mamba install -y -c conda-forge \
ipywidgets \
jupyterlab \
nb_conda_kernels
nb_conda_kernels \
&& pip install jupyter_contrib_nbextensions==0.5.1 \
&& jupyter contrib nbextension install --user \
&& pip install jupyterlab_nvdashboard==0.7.0

# Launch jupyter
CMD ["jupyter-lab", "--ip=0.0.0.0", "--no-browser", "--allow-root"]
127 changes: 122 additions & 5 deletions examples/digital_fingerprinting/production/README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,134 @@
# "Production" Digital Fingerprinting Pipeline

### Build the Morpheus container
This example is designed to show what a full scale, production ready, DFP deployment in Morpheus would look like. It contains all of the necessary components (such as a model store), to allow multiple Morpheus pipelines to communicate at a scale that can handle the workload of an entire company.

This is necessary to get the latest changes needed for DFP
Key Differences:
* Multiple pipelines are specialized to perform either training or inference
* Requires setting up a model store to allow the training and inference pipelines to communicate
* Organized into a docker-compose deployment for easy startup
* Contains a Jupyter notebook service to ease development and debugging
* Can be deployed to Kubernetes using provided Helm charts
* Uses many customized stages to maximize performance.

## Build the Morpheus container
This is necessary to get the latest changes needed for DFP. From the root of the Morpheus repo:
```bash
./docker/build_container_release.sh
```

### Running locally via `docker-compose`

## Building and Running via `docker-compose`
### Build
```bash
cd examples/digital_fingerprinting/production
export MORPHEUS_CONTAINER_VERSION="$(git describe --tags --abbrev=0)-runtime"
docker-compose build
```

### Running the services
#### Jupyter Server
From the `examples/digital_fingerprinting/production` dir run:
```bash
docker-compose up jupyter
```

Once the build is complete and the service has started you will be prompted with a message that should look something like:
```
jupyter | To access the server, open this file in a browser:
jupyter | file:///root/.local/share/jupyter/runtime/jpserver-7-open.html
jupyter | Or copy and paste one of these URLs:
jupyter | http://localhost:8888/lab?token=<token>
jupyter | or http://127.0.0.1:8888/lab?token=<token>
```

Copy and paste the url into a web browser. There are four notebooks included with the DFP example:
* dfp_azure_training.ipynb - Training pipeline for Azure Active Directory data
* dfp_azure_inference.ipynb - Inference pipeline for Azure Active Directory data
* dfp_duo_training.ipynb - Training pipeline for Duo Authentication
* dfp_duo_inference.ipynb - Inference pipeline for Duo Authentication

> **Note:** The token in the url is a one-time use token, and a new one is generated with each invocation.
#### Morpheus Pipeline
By default the `morpheus_pipeline` will run the training pipeline for Duo data, from the `examples/digital_fingerprinting/production` dir run:
```bash
docker-compose up morpheus_pipeline
```

If instead you wish to run a different pipeline, from the `examples/digital_fingerprinting/production` dir run:
```bash
docker-compose run morpheus_pipeline bash
```

From the prompt within the `morpheus_pipeline` container you can run either the `dfp_azure_pipeline.py` or `dfp_duo_pipeline.py` pipeline scripts.
```bash
python dfp_azure_pipeline.py --help
python dfp_duo_pipeline.py --help
```

Both scripts are capable of running either a training or inference pipeline for their respective data sources. The command line options for both are the same:
| Flag | Type | Description |
| ---- | ---- | ----------- |
| `--train_users` | One of: `all`, `generic`, `individual`, `none` | Indicates whether or not to train per user or a generic model for all users. Selecting `none` runs the inference pipeline. |
| `--skip_user` | TEXT | User IDs to skip. Mutually exclusive with `only_user` |
| `--only_user` | TEXT | Only users specified by this option will be included. Mutually exclusive with `skip_user` |
| `--start_time` | TEXT | The start of the time window, if undefined start_date will be `now()-duration` |
| `--duration` | TEXT | The duration to run starting from now [default: 60d] |
| `--cache_dir` | TEXT | The location to cache data such as S3 downloads and pre-processed data [env var: `DFP_CACHE_DIR`; default: `./.cache/dfp`] |
| `--log_level` | One of: `CRITICAL`, `FATAL`, `ERROR`, `WARN`, `WARNING`, `INFO`, `DEBUG` | Specify the logging level to use. [default: `WARNING`] |
| `--sample_rate_s` | INTEGER | Minimum time step, in milliseconds, between object logs. [env var: `DFP_SAMPLE_RATE_S`; default: 0] |
| `-f`, `--input_file` | TEXT | List of files to process. Can specify multiple arguments for multiple files. Also accepts glob (*) wildcards and schema prefixes such as `s3://`. For example, to make a local cache of an s3 bucket, use `filecache::s3://mybucket/*`. See [fsspec documentation](https://filesystem-spec.readthedocs.io/en/latest/api.html?highlight=open_files#fsspec.open_files) for list of possible options. |
| `--tracking_uri` | TEXT | The MLflow tracking URI to connect to the tracking backend. [default: `http://localhost:5000`] |
| `--help` | | Show this message and exit. |


#### Optional MLflow Service
Starting either the `morpheus_pipeline` or the `jupyter` service, will start the `mlflow` service in the background. For debugging purposes it can be helpful to view the logs of the running MLflow service.

From the `examples/digital_fingerprinting/production` dir run:
```bash
docker-compose up mlflow
```

By default, a mlflow dashboard will be available at:
```bash
http://localhost:5000
```

## Kubernetes deployment

The Morpheus project also maintains Helm charts and container images for Kubernetes deployment of Morpheus and MLflow (both for serving and for the Triton plugin). These are located in the NVIDIA GPU Cloud (NGC) [public catalog](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/morpheus/collections/morpheus_).

### MLflow Helm chart

MLflow for this production digital fingerprint use case can be installed from NGC using these same instructions for the [MLflow Triton Plugin from the Morpheus Quick Start Guide](../../../docs/source/morpheus_quickstart_guide.md#install-morpheus-mlflow-triton-plugin). The chart and image can be used for both the Triton plugin and also MLflow server.

### Production DFP Helm chart

The deployment of the [Morpheus SDK Client](../../../docs/source/morpheus_quickstart_guide.md#install-morpheus-sdk-client) is also done _almost_ the same way as what's specified in the Quick Start Guide. However, you would specify command arguments differently for this production DFP use case.

#### Notebooks

```
helm install --set ngc.apiKey="$API_KEY",sdk.args="cd /workspace/examples/digital_fingerprinting/production/morpheus && jupyter-lab --ip='*' --no-browser --allow-root --ServerApp.allow_origin='*'" <sdk-release-name> morpheus-sdk-client/
```

Make note of the Jupyter token by examining the logs of the SDK pod:
```
kubectl logs sdk-cli-<sdk-release-name>
```

You should see something similar to this:

docker-compose up
```
Or copy and paste one of these URLs:
http://localhost:8888/lab?token=d16c904468fdf666c5030e18fb82f840e531178bf716e575
or http://127.0.0.1:8888/lab?token=d16c904468fdf666c5030e18fb82f840e531178bf716e575
```

Open your browser to the reachable address and NodePort exposed by the pod (default value of 30888) and use the generated token to login into the notebook.

#### Unattended

```
helm install --set ngc.apiKey="$API_KEY",sdk.args="cd /workspace/examples/digital_fingerprinting/production/morpheus && ./launch.sh --train_users=generic --duration=1d" <sdk-release-name> morpheus-sdk-client/
```
2 changes: 1 addition & 1 deletion examples/digital_fingerprinting/production/conda_env.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,4 @@ dependencies:
- librdkafka
- mlflow
- papermill
- s3fs
- s3fs==2022.8.2
16 changes: 14 additions & 2 deletions examples/digital_fingerprinting/production/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,12 @@ services:
target: jupyter
args:
- MORPHEUS_CONTAINER_VERSION=${MORPHEUS_CONTAINER_VERSION:-v22.09.00-runtime}
deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities: [gpu]
image: dfp_morpheus_jupyter
container_name: jupyter
ports:
Expand All @@ -58,7 +64,7 @@ services:
cap_add:
- sys_nice

morpheus_training:
morpheus_pipeline:
# restart: always
build:
context: ./
Expand All @@ -67,7 +73,13 @@ services:
args:
- MORPHEUS_CONTAINER_VERSION=${MORPHEUS_CONTAINER_VERSION:-v22.09.00-runtime}
image: dfp_morpheus
container_name: morpheus_training
container_name: morpheus_pipeline
deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities: [gpu]
networks:
- frontend
- backend
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ RUN apt update && \
rm -rf /var/cache/apt/* /var/lib/apt/lists/*

# Install python packages
RUN pip install mlflow boto3 pymysql pyyaml
RUN pip install "mlflow<1.29.0" boto3 pymysql pyyaml

# We run on port 5000
EXPOSE 5000
Expand Down
Loading

0 comments on commit 3a2d9c0

Please sign in to comment.