Skip to content

Commit

Permalink
Add MLFlow Tracker based integration test
Browse files Browse the repository at this point in the history
Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>
  • Loading branch information
vinnamkim committed Dec 12, 2023
1 parent 9a1dc19 commit 7cb9a1f
Show file tree
Hide file tree
Showing 15 changed files with 423 additions and 15 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,6 @@ dmypy.json
src/**/*.c
src/**/*.html
src/**/*.so

# For regression test data storage
for_developers/regression_test/postgres_data/**/*
7 changes: 7 additions & 0 deletions for_developers/regression_test/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
ARG http_proxy
ARG https_proxy
ARG no_proxy

FROM python:3.10-slim-bullseye

RUN pip install --no-cache-dir mlflow==2.8.1 psycopg2-binary==2.9.9
77 changes: 77 additions & 0 deletions for_developers/regression_test/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# How to do integration test

## How to deploy MLFlow Tracking Server with POSTGRES Database

1. Build MLFlow Tracking Server docker image

```console
for_developers/regression_test$ ./build.sh
```

2. Create DB data storage directory

```console
for_developers/regression_test$ mkdir postgres_data
```

3. Deploy MLFlow Tracking Server

You should set a proper value for the password environment variable, `POSTGRES_PASSWORD=<SET_MY_PASSWORD>`.

```console
for_developers/regression_test$ USER=$(id -u) POSTGRES_PASSWORD=<SET_MY_PASSWORD> docker compose up -d
```

## How to execute regression test

1. Prerequisite

You must [deploy the MLFlow Tracking server](#how-to-deploy-mlflow-tracking-server-with-postgres-database) beforehand executing the regression test.
After the deployment, you can access to your MLFlow Tracking server instance with `http://<server-ip>:5000` (e.g., if you launched the server instance in your local machine, `http://localhost:5000`).
You might see the following screen in your web browser:

| ![Dashboard](images/mlflow_dashboard.png) |
| :---------------------------------------: |
| MLFlow Dashboard |

By using MLFlow Tracking server, we can save, load, and visualize the regression test results easily.
Additionaly, our MLFlow Tracking server stores the data in PostgreSQL database backend.
This means that we can have our own web front-end for the regression tests in the future.

| ![Metric and Tags](images/mlflow_metrics_and_tags.png) |
| :-------------------------------------------------------------------: |
| Test results and environment will be recorded in the metrics and tags |

| ![Filtering MLFlow Run](images/mlflow_filtering.png) |
| :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| One of MLFlow Tracking server UI provides filtering functionality. In this example, we filter the regression results. Test results and environment will be recorded in the metrics and tags |

2. Launch the regression tests

To launch the regression tests, you should use the PyTest framework.
We developed the regression tests workflow in `tests/regression`.

```console
pytest tests/regression --mlflow-tracking-uri http://<server-ip>:5000 --dataset-root-dir <dataset-root-dir> --user-name '<user-name>'
```

There are three CLI arguments you should put in the testing commands:

- `--mlflow-tracking-uri http://<server-ip>:5000`: This is the MLFlow Tracking server URI where the integration test results will be stored in.

- `--dataset-root-dir <dataset-root-dir>`: This is the local directory path where the integration test should look as the input dataset.

```console
<dataset-root-dir>
├── classification
│ └── multiclass_CUB_small
│ ├── 1
│ │ ├── train
│ │ ├── val
│ │ └── test
| ...
├── detection
...
```

- `--user-name <user-name>`: This is the user name who launched the integration test.
9 changes: 9 additions & 0 deletions for_developers/regression_test/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/bash
http_proxy=${http_proxy:-}
https_proxy=${https_proxy:-}
no_proxy=${no_proxy:-}

docker build -t mlflow-tracker:v2.8.1 \
--build-arg http_proxy="$http_proxy" \
--build-arg https_proxy="$https_proxy" \
--build-arg no_proxy="$no_proxy" .
19 changes: 19 additions & 0 deletions for_developers/regression_test/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
version: "3.9"

services:
postgres-db:
image: postgres
restart: always
user: $USER
environment:
- POSTGRES_USER=admin
- POSTGRES_PASSWORD=$POSTGRES_PASSWORD
volumes:
- ./postgres_data:/var/lib/postgresql/data

mlflow:
image: mlflow-tracker:v2.8.1
restart: always
ports:
- 5000:5000
command: "mlflow server --host 0.0.0.0 --backend-store-uri postgresql+psycopg2://admin:$POSTGRES_PASSWORD@postgres-db:5432"
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,8 @@ dev = [
"pytest-mock",
"pytest-csv",
"pytest-cov",
"mlflow==2.8.1", # For regression test
"py-cpuinfo=9.0.0", # For regression test
]
docs = [
"furo",
Expand Down
23 changes: 11 additions & 12 deletions src/otx/cli/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

from __future__ import annotations

from typing import TYPE_CHECKING
from typing import TYPE_CHECKING, Any

from hydra import compose, initialize
from jsonargparse import ArgumentParser
Expand All @@ -16,9 +16,11 @@

if TYPE_CHECKING:
from jsonargparse._actions import _ActionSubCommands
from pytorch_lightning import Trainer

register_configs()


def add_train_parser(subcommands_action: _ActionSubCommands) -> None:
"""Add subparser for train command.
Expand All @@ -30,10 +32,12 @@ def add_train_parser(subcommands_action: _ActionSubCommands) -> None:
"""
parser = ArgumentParser()
parser.add_argument("overrides", help="overrides values", default=[], nargs="+")
subcommands_action.add_subcommand("train", parser, help="Training subcommand for OTX")
subcommands_action.add_subcommand(
"train", parser, help="Training subcommand for OTX"
)


def otx_train(overrides: list[str]) -> None:
def otx_train(overrides: list[str]) -> tuple[Trainer, dict[str, Any]]:
"""Main entry point for training.
:param overrides: Override List values.
Expand All @@ -43,17 +47,12 @@ def otx_train(overrides: list[str]) -> None:
# (e.g. ask for tags if none are provided in cfg, print cfg tree, etc.)
# utils.extras(cfg)
with initialize(config_path="../config", version_base="1.3", job_name="otx_train"):
cfg = compose(config_name="train", overrides=overrides, return_hydra_config=True)
cfg = compose(
config_name="train", overrides=overrides, return_hydra_config=True
)
configure_hydra_outputs(cfg)

# train the model
from otx.core.engine.train import train
metric_dict, _ = train(cfg)

# # safely retrieve metric value for hydra-based hyperparameter optimization
# metric_value = utils.get_metric_value(
# metric_dict=metric_dict, metric_name=cfg.get("optimized_metric")
# )

# # return optimized metric
# return metric_value
return train(cfg)
4 changes: 2 additions & 2 deletions src/otx/config/callbacks/classification.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@ defaults:
model_checkpoint:
dirpath: ${base.output_dir}/checkpoints
filename: "epoch_{epoch:03d}"
monitor: "accuracy"
monitor: "val/accuracy"
mode: "max"
save_last: True
auto_insert_metric_name: False

early_stopping:
monitor: "accuracy"
monitor: "val/accuracy"
patience: 100
mode: "max"
2 changes: 1 addition & 1 deletion src/otx/core/model/module/classification.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ def on_test_epoch_end(self) -> None:

def _log_metrics(self, meter: Accuracy, key: str) -> None:
results = meter.compute()
self.log("accuracy", results.item(), sync_dist=True, prog_bar=True)
self.log(f"{key}/accuracy", results.item(), sync_dist=True, prog_bar=True)

def validation_step(self, inputs: MulticlassClsBatchDataEntity, batch_idx: int) -> None:
"""Perform a single validation step on a batch of data from the validation set.
Expand Down
2 changes: 2 additions & 0 deletions tests/regression/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Copyright (C) 2023 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
156 changes: 156 additions & 0 deletions tests/regression/conftest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# Copyright (C) 2023 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

from __future__ import annotations

import logging
import platform
import subprocess
from datetime import datetime, timedelta, timezone
from pathlib import Path
from urllib.parse import urlparse

import pytest
from cpuinfo import get_cpu_info
from otx import __version__

import mlflow

log = logging.getLogger(__name__)


def pytest_addoption(parser: pytest.Parser) -> None:
parser.addoption(
"--user-name",
type=str,
required=True,
help="Sign-off the user name who launched the regression tests this time, "
'e.g., `--user-name "John Doe"`.',
)
parser.addoption(
"--dataset-root-dir",
type=Path,
required=True,
help="Dataset root directory path for the regression tests",
)
parser.addoption(
"--mlflow-tracking-uri",
type=str,
required=True,
help="URI for MLFlow Tracking server to store the regression test results.",
)
parser.addoption(
"--num-repeat",
type=int,
default=1,
help="The number of repetitions for each test case with different seed (default=1).",
)


@pytest.fixture(scope="module", autouse=True)
def fxt_user_name(request: pytest.FixtureRequest) -> str:
"""User name to sign off the regression test execution.
This should be given by the PyTest CLI option.
"""
user_name = request.config.getoption("--user-name")
msg = f"user_name: {user_name}"
log.info(msg)
return user_name


@pytest.fixture(scope="module", autouse=True)
def fxt_dataset_root_dir(request: pytest.FixtureRequest) -> Path:
"""Dataset root directory path.
This should be given by the PyTest CLI option.
"""
dataset_root_dir = request.config.getoption("--dataset-root-dir")
msg = f"dataset_root_dir: {dataset_root_dir}"
log.info(msg)
return dataset_root_dir


@pytest.fixture(scope="module", autouse=True)
def fxt_mlflow_tracking_uri(request: pytest.FixtureRequest) -> str:
"""MLFLow tracking server URI.
This should be given by the PyTest CLI option.
"""
mlflow_tracking_uri = urlparse(
request.config.getoption("--mlflow-tracking-uri"),
).geturl()
msg = f"fxt_mlflow_tracking_uri: {mlflow_tracking_uri}"
log.info(msg)
return mlflow_tracking_uri


@pytest.fixture(scope="module", autouse=True)
def fxt_num_repeat(request: pytest.FixtureRequest) -> int:
"""The number of repetition for each test case.
The random seed will be set for [0, fxt_num_repeat - 1]. Default is one.
"""
num_repeat = request.config.getoption("--num-repeat")
msg = f"fxt_num_repeat: {fxt_num_repeat}"
log.info(msg)
return num_repeat


@pytest.fixture(scope="module", autouse=True)
def fxt_mlflow_experiment_name(fxt_user_name) -> str:
"""MLFlow Experiment name (unique key).
MLFlow Experiment name is an unique key as same as experiment id.
Every MLFlow Run belongs to MLFlow Experiment.
"""
tz = timezone(offset=timedelta(hours=9), name="Seoul")
date = datetime.now(tz=tz).date()
return f"OTX: {__version__}, Signed-off-by: {fxt_user_name}, Date: {date}"


@pytest.fixture(scope="module", autouse=True)
def fxt_tags(fxt_user_name) -> dict[str, str]:
"""Tag fields to record the machine and user executing this regression test."""
return {
"user_name": fxt_user_name,
"machine_name": platform.node(),
"cpu_info": get_cpu_info()["brand_raw"],
"accelerator_info": subprocess.check_output(
["nvidia-smi", "-L"], # noqa: S603, S607
)
.decode()
.strip(),
}


@pytest.fixture(scope="module", autouse=True)
def fxt_mlflow_experiment(
fxt_mlflow_experiment_name: str,
fxt_mlflow_tracking_uri: str,
fxt_tags: dict[str, str],
) -> None:
"""Set MLFlow Experiment
If there is a MLFlow Experiment which has the same name with the given name,
it will use that MLFlow Experiment. Otherwise, it will create a new one and use it.
"""
mlflow.set_tracking_uri(fxt_mlflow_tracking_uri)
exp = mlflow.get_experiment_by_name(name=fxt_mlflow_experiment_name)
exp_id = (
mlflow.create_experiment(
name=fxt_mlflow_experiment_name,
tags=fxt_tags,
)
if exp is None
else exp.experiment_id
)
mlflow.set_experiment(experiment_id=exp_id)


@pytest.fixture(scope="module", autouse=True)
def fxt_recipe_dir() -> Path:
"""OTX recipe directory."""
import otx.recipe as otx_recipe

return Path(otx_recipe.__file__).parent
Loading

0 comments on commit 7cb9a1f

Please sign in to comment.