Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/v2' into enhance/perf-classifi…
Browse files Browse the repository at this point in the history
…cation
  • Loading branch information
vinnamkim committed Jan 2, 2024
2 parents 86b8334 + af344e9 commit 70582d9
Show file tree
Hide file tree
Showing 73 changed files with 2,445 additions and 26 deletions.
1,381 changes: 1,381 additions & 0 deletions for_developers/images/product_design/core_design_drawing.drawio

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
95 changes: 93 additions & 2 deletions for_developers/product_design.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

1. **Provide a deep learning model training framework that encompasses various tasks solvable by deep learning models.** Our product aims to be convenient for developers, allowing them to easily expand support for tasks and models.

2. **Empower users to seamlessly and intuitively navigate the OpenVINO-based workflow, from model training to inference.** To ensure a user-friendly experience throughout, our product should have user-friendly CLI and Python API entrypoints.
2. **Empower users to seamlessly and intuitively navigate the OpenVINO-based workflow, from model training to inference.** To ensure a user-friendly experience throughout, our product should have user-friendly CLI and Python API entrypoints.

3. **Support Intel devices for accelerating deep learning model training:** We offer best practices for training from scratch or fine-tuning popular model architectures using Intel devices.

Expand All @@ -14,7 +14,89 @@ To realize our product missions, we establish the following design principles an

### Core

TBD @wonjuleee @vinnamkim
1. **Static Typing System for Task-Data-Model to Accelerate Task and Model Development Cycles**

_"Advocates of static typing argue that ... a better design time developer experience..."_[^1]

Our framework should support various deep learning tasks simultaneously.
To achieve this mission, it is essential to make it developer-friendly for expanding our product across three dimensions: task, data, and model.
The key insight here is that, while these dimensions may seem independent, they can be interconnected into a single dimension, specifically the task, where both the data and model depend on.
We believe that nice abstractions with static typing can help developers in this regard.
Therefore, we here introduce the **Task-Data-Model** abstraction with static typing.

| ![Task-Data-Model](./images/product_design/task_data_model.png) |
| :-------------------------------------------------------------: |
| Figure 1. Task-Data-Model abstractions with static typing |

As shown in Figure 1, there are abstractions (grey-colored boxes) that form the base for each task:

1. `OTXTaskType`: A Python enum defining the deep learning task supported in this framework.
2. `OTXDataEntity`: A Python dataclass object representing a single data sample for each `OTXTaskType`.
3. `OTXBatchDataEntity`: A Python dataclass object representing a batch of `OTXDataEntity`.
4. `OTXModel`: PyTorch module (`nn.Module`) defined for each `OTXTaskType`.
Its `forward()` function has strict typing: 1) `OTXBatchDataEntity` as inputs and 2) `OTXBatchLossEntity` or `OTXBatchPredEntity` as outputs.

On the other hand, we can see embodiments from the abstractions (blue-colored boxes) for the detection task (`OTXTaskType.DETECTION`):

1. `DetDataEntity` and `DetBatchDataEntity`: Python dataclass objects including `bboxes` and `labels` fields for the detection task.
`DetBatchDataEntity` is used for the input of `OTXDetectionModel`'s `forward()` function.
2. `DetBatchPredEntity`: Python dataclass object including the `scores` field beyond `DetBatchDataEntity`.
It is used for the output of `OTXDetectionModel`'s `forward()` function at inference mode.

As observed, a developer aiming to add a new model to OTX for the detection task only needs to consider the strictly typed input and output (`DetBatchDataEntity` and `DetBatchPredEntity`).

Conversely, in the event of a new task requirement, we can incorporate it by implementing embodiments from the base, such as `OTXTaskType`, `OTXDataEntity`, `OTXBatchDataEntity`, and `OTXModel`.

2. **Single Engine but Capable of Incorporating Other Public Model Training Frameworks**

_"Don't Reinvent the Wheel"_[^2]

We choose PyTorch Lightning[^3] as the primary engine for our training framework.
However, as you are aware, there are numerous open-source model training frameworks worldwide.
Often, these frameworks are tightly coupled with their own engines, hindering the reuse of valuable data pipelines or model implementations.
This goes against the well-known software engineering principle, _"Don't Reinvent the Wheel."_

Nevertheless, our `OTXModel` class features special abstract member functions that enable us to reuse model implementations from any framework.
These special functions, namely `create_model()`, `customize_inputs()` and `customize_outputs()`, facilitate the conversion of our own data entity class to the format required by a model imported from an external framework.

Let's consider an example of importing a model from MMDetection[^4]:

| ![Reuse Model](./images/product_design/reuse_model.png) |
| :-----------------------------------------------------: |
| Figure 2. Import RTMDetTiny[^5] model from MMDetection |

In Figure 2, `MMDetCompatibleModel` implements `create_model()` to generate the `RTMDetTiny` model and registers it as a child module.
At this point, the `RTMDetTiny` model requires `DataSample` input from MMDetection and produces a Python dictionary for its output.
These data entity types do not comply with our framework.
To resolve this issue, `MMDetCompatibleModel` implements `customize_inputs()` and `customize_outputs()` functions to convert them into our format.
With this design, we can explicitly support major frameworks. We plan to continually expand the supported frameworks using this approach.

3. **Support Various Data Formats without Worrying about Customizing the Input Data Pipeline**

_"Much of the time for an AI project is likely to be spent on data-related tasks ..."_[^6]

When developing a new deep learning model, one of the most tedious tasks is often creating a data pipeline for the given dataset.
Developers must study the data schema of the provided dataset and parse it into a data entity consumable by both the data augmentation pipeline and the model.

To simplify and enhance this process in the model development lifecycle, we introduce Datumaro[^7], a member of the OpenVINO™ open-source ecosystem.
Datumaro allows the import of various data formats from around the world.
Please refer to the following Figure.

| ![Support Various Data Format](./images/product_design/support_various_data_format.png) |
| :-------------------------------------------------------------------------------------: |
| Figure 3. Only the root directory path is provided by the user to make a data pipeline |

In Figure 3, a popular data format such as COCO, VOC, YOLO, or Cityscape can be imported with Datumaro.
While Figure 3 illustrates the entire data pipeline, the only input required from the user is `data_root`.
This simplicity allows users to enjoy a convenient model training workflow without worrying about the input data pipeline.

Another advantage highlighted in Figure 3 is the support for multiple data augmentation pipelines.
As shown, we demonstrate how to incorporate a data augmentation pipeline from an external framework.
We provide the necessary online conversions before and after the data augmentation pipeline from the external framework.
As a result, by adjusting or using the configuration from the template we provide, users can utilize not only the default pipeline from TorchVision[^8], but also other pipelines such as MMPretrain, MMDetection, etc.
This flexibility enables developers to freely compose the model and the data augmentation pipeline from various frameworks, accelerating the creation of custom model training templates for specific problem domains.

Authors: @wonjuleee @vinnamkim

### Entrypoint

Expand All @@ -23,3 +105,12 @@ TBD @samet-akcay @harimkang
### Intel Device Support

TBD

[^1]: Meijer, Erik, and Peter Drayton. "Static typing where possible, dynamic typing when needed: The end of the cold war between programming languages." OOPSLA, 2004.
[^2]: https://en.wikipedia.org/wiki/Reinventing_the_wheel#In_software_development
[^3]: https://lightning.ai/pytorch-lightning
[^4]: https://github.com/open-mmlab/mmdetection
[^5]: https://github.com/open-mmlab/mmdetection/tree/main/configs/rtmdet
[^6]: https://www.netapp.com/media/16928-wp-7299.pdf
[^7]: https://github.com/openvinotoolkit/datumaro
[^8]: https://pytorch.org/vision/0.16/transforms.html
4 changes: 4 additions & 0 deletions src/otx/config/base/action_detection.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
defaults:
- default

task: ACTION_DETECTION
15 changes: 15 additions & 0 deletions src/otx/config/callbacks/action_detection.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
defaults:
- default

model_checkpoint:
dirpath: ${base.output_dir}/checkpoints
filename: "epoch_{epoch:03d}"
monitor: "val/map"
mode: "max"
save_last: True
auto_insert_metric_name: False

early_stopping:
monitor: "val/map"
patience: 100
mode: "max"
File renamed without changes.
16 changes: 16 additions & 0 deletions src/otx/config/data/mmaction_detection.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
defaults:
- default

data_format: ava

mem_cache_img_max_size: ${as_int_tuple:500,500}

train_subset:
batch_size: 64
transform_lib_type: MMACTION
val_subset:
batch_size: 64
transform_lib_type: MMACTION
test_subset:
batch_size: 64
transform_lib_type: MMACTION
File renamed without changes.
11 changes: 11 additions & 0 deletions src/otx/config/model/mmaction_detection.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
defaults:
- default

_target_: otx.core.model.module.action_detection.OTXActionDetLitModule

otx_model:
_target_: otx.core.model.entity.action_detection.MMActionCompatibleModel
config: ???

# compile model for faster training with pytorch 2.0
torch_compile: false
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Copyright (C) 2023 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
#
"""Module for OTXActionDataset."""
"""Module for OTXActionClsDataset."""

from __future__ import annotations

Expand All @@ -11,7 +11,7 @@
from datumaro import Label

from otx.core.data.dataset.base import OTXDataset
from otx.core.data.entity.action import ActionClsBatchDataEntity, ActionClsDataEntity
from otx.core.data.entity.action_classification import ActionClsBatchDataEntity, ActionClsDataEntity
from otx.core.data.entity.base import ImageInfo


Expand Down
103 changes: 103 additions & 0 deletions src/otx/core/data/dataset/action_detection.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# Copyright (C) 2023 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
#
"""Module for OTXActionDetDataset."""

from __future__ import annotations

import pickle
from pathlib import Path
from typing import Callable

import numpy as np
import torch
from datumaro import Bbox, Image
from datumaro.components.annotation import AnnotationType
from torchvision import tv_tensors

from otx.core.data.dataset.base import OTXDataset
from otx.core.data.entity.action_detection import ActionDetBatchDataEntity, ActionDetDataEntity
from otx.core.data.entity.base import ImageInfo


class OTXActionDetDataset(OTXDataset[ActionDetDataEntity]):
"""OTXDataset class for action detection task."""

def __init__(self, **kwargs) -> None:
super().__init__(**kwargs)
self.num_classes = len(self.dm_subset.categories()[AnnotationType.label])

def _get_item_impl(self, idx: int) -> ActionDetDataEntity | None:
item = self.dm_subset.get(id=self.ids[idx], subset=self.dm_subset.name)
img = item.media_as(Image)
img_data, img_shape = self._get_img_data_and_shape(img)

bbox_anns = [ann for ann in item.annotations if isinstance(ann, Bbox)]
bboxes = (
np.stack([ann.points for ann in bbox_anns], axis=0).astype(np.float32)
if len(bbox_anns) > 0
else np.zeros((0, 4), dtype=np.float32)
)

entity = ActionDetDataEntity(
image=img_data,
img_info=ImageInfo(
img_idx=idx,
img_shape=img_shape,
ori_shape=img_shape,
pad_shape=img_shape,
scale_factor=(1.0, 1.0),
),
bboxes=tv_tensors.BoundingBoxes(
bboxes,
format=tv_tensors.BoundingBoxFormat.XYXY,
canvas_size=img_shape,
),
labels=torch.nn.functional.one_hot(
torch.as_tensor([ann.label for ann in bbox_anns]),
self.num_classes,
).to(torch.float),
frame_path=item.media.path,
proposals=self._get_proposals(
item.media.path,
self.dm_subset.infos().get(f"{self.dm_subset.name}_proposals", None),
),
)

return self._apply_transforms(entity)

@staticmethod
def _get_proposals(frame_path: str, proposal_file: str | None) -> np.ndarray:
"""Get proposal from frame path and proposal file name.
Datumaro AVA dataset expect data structure as
- data_root/
- frames/
- video0
- video0_0001.jpg
- vdieo0_0002.jpg
- annotations/
- train.csv
- val.csv
- train.pkl
- val.pkl
"""
if proposal_file is None:
return np.array([[0, 0, 1, 1]], dtype=np.float64)

annotation_dir = Path(frame_path).parent.parent.parent
proposal_file_path = annotation_dir / "annotations" / proposal_file
if not proposal_file_path.exists():
return np.array([[0, 0, 1, 1]], dtype=np.float64)
with Path.open(proposal_file_path, "rb") as f:
info = pickle.load(f) # noqa: S301
return (
info[",".join(Path(frame_path).stem.rsplit("_", 1))][:, :4]
if ",".join(Path(frame_path).stem.rsplit("_", 1)) in info
else np.array([[0, 0, 1, 1]], dtype=np.float32)
)

@property
def collate_fn(self) -> Callable:
"""Collection function to collect ActionClsDataEntity into ActionClsBatchDataEntity."""
return ActionDetBatchDataEntity.collate_fn
File renamed without changes.
87 changes: 87 additions & 0 deletions src/otx/core/data/entity/action_detection.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Copyright (C) 2023 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
#
"""Module for OTX action data entities."""

from __future__ import annotations

from dataclasses import dataclass
from typing import TYPE_CHECKING

from otx.core.data.entity.base import (
OTXBatchDataEntity,
OTXBatchPredEntity,
OTXDataEntity,
OTXPredEntity,
)
from otx.core.data.entity.utils import register_pytree_node
from otx.core.types.task import OTXTaskType

if TYPE_CHECKING:
from torch import LongTensor
from torchvision import tv_tensors


@register_pytree_node
@dataclass
class ActionDetDataEntity(OTXDataEntity):
"""Data entity for action classification task.
Args:
bboxes: 2D bounding boxes for actors.
labels: One-hot vector of video's action labels.
frame_path: Data media's file path for getting proper meta information.
proposals: Pre-calculated actor proposals.
"""

bboxes: tv_tensors.BoundingBoxes
labels: LongTensor
frame_path: str
proposals: tv_tensors.BoundingBoxes

@property
def task(self) -> OTXTaskType:
"""OTX Task type definition."""
return OTXTaskType.ACTION_DETECTION


@dataclass
class ActionDetPredEntity(ActionDetDataEntity, OTXPredEntity):
"""Data entity to represent the action classification model's output prediction."""


@dataclass
class ActionDetBatchDataEntity(OTXBatchDataEntity[ActionDetDataEntity]):
"""Batch data entity for action classification.
Args:
bboxes(list[tv_tensors.BoundingBoxes]): A list of bounding boxes of videos.
labels(list[LongTensor]): A list of labels of videos.
"""

bboxes: list[tv_tensors.BoundingBoxes]
labels: list[LongTensor]
proposals: list[tv_tensors.BoundingBoxes]

@property
def task(self) -> OTXTaskType:
"""OTX task type definition."""
return OTXTaskType.ACTION_DETECTION

@classmethod
def collate_fn(cls, entities: list[ActionDetDataEntity]) -> ActionDetBatchDataEntity:
"""Collection function to collect `ActionClsDataEntity` into `ActionClsBatchDataEntity`."""
batch_data = super().collate_fn(entities)
return ActionDetBatchDataEntity(
batch_size=batch_data.batch_size,
images=batch_data.images,
imgs_info=batch_data.imgs_info,
bboxes=[entity.bboxes for entity in entities],
labels=[entity.labels for entity in entities],
proposals=[entity.proposals for entity in entities],
)


@dataclass
class ActionDetBatchPredEntity(ActionDetBatchDataEntity, OTXBatchPredEntity):
"""Data entity to represent model output predictions for action classification task."""
10 changes: 9 additions & 1 deletion src/otx/core/data/factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -122,12 +122,20 @@ def create(
)

if task == OTXTaskType.ACTION_CLASSIFICATION:
from .dataset.action import OTXActionClsDataset
from .dataset.action_classification import OTXActionClsDataset

return OTXActionClsDataset(
dm_subset=dm_subset,
transforms=transforms,
mem_cache_img_max_size=cfg_data_module.mem_cache_img_max_size,
)

if task == OTXTaskType.ACTION_DETECTION:
from .dataset.action_detection import OTXActionDetDataset

return OTXActionDetDataset(
dm_subset=dm_subset,
transforms=transforms,
mem_cache_img_max_size=cfg_data_module.mem_cache_img_max_size,
)
raise NotImplementedError(task)
Loading

0 comments on commit 70582d9

Please sign in to comment.