Merge remote-tracking branch 'upstream/v2' into enhance/perf-classifi…

…cation
openvinotoolkit · Jan 2, 2024 · 70582d9 · 70582d9
2 parents 86b8334 + af344e9
commit 70582d9
Show file tree

Hide file tree

Showing 73 changed files with 2,445 additions and 26 deletions.
diff --git a/for_developers/images/product_design/core_design_drawing.drawio b/for_developers/images/product_design/core_design_drawing.drawio
diff --git a/for_developers/images/product_design/reuse_model.png b/for_developers/images/product_design/reuse_model.png
diff --git a/for_developers/images/product_design/support_various_data_format.png b/for_developers/images/product_design/support_various_data_format.png
diff --git a/for_developers/images/product_design/task_data_model.png b/for_developers/images/product_design/task_data_model.png
diff --git a/for_developers/product_design.md b/for_developers/product_design.md
@@ -4,7 +4,7 @@
 
 1. **Provide a deep learning model training framework that encompasses various tasks solvable by deep learning models.** Our product aims to be convenient for developers, allowing them to easily expand support for tasks and models.
 
-2. **Empower users to seamlessly and intuitively navigate the OpenVINO-based workflow, from model training to inference.** To ensure a user-friendly experience throughout, our product should have user-friendly CLI and Python API entrypoints.
+2. **Empower users to seamlessly and intuitively navigate the OpenVINO™-based workflow, from model training to inference.** To ensure a user-friendly experience throughout, our product should have user-friendly CLI and Python API entrypoints.
 
 3. **Support Intel devices for accelerating deep learning model training:** We offer best practices for training from scratch or fine-tuning popular model architectures using Intel devices.
 
@@ -14,7 +14,89 @@ To realize our product missions, we establish the following design principles an
 
 ### Core
 
-TBD @wonjuleee @vinnamkim
+1. **Static Typing System for Task-Data-Model to Accelerate Task and Model Development Cycles**
+
+   _"Advocates of static typing argue that ... a better design time developer experience..."_[^1]
+
+   Our framework should support various deep learning tasks simultaneously.
+   To achieve this mission, it is essential to make it developer-friendly for expanding our product across three dimensions: task, data, and model.
+   The key insight here is that, while these dimensions may seem independent, they can be interconnected into a single dimension, specifically the task, where both the data and model depend on.
+   We believe that nice abstractions with static typing can help developers in this regard.
+   Therefore, we here introduce the **Task-Data-Model** abstraction with static typing.
+
+   | ![Task-Data-Model](./images/product_design/task_data_model.png) |
+   | :-------------------------------------------------------------: |
+   |    Figure 1. Task-Data-Model abstractions with static typing    |
+
+   As shown in Figure 1, there are abstractions (grey-colored boxes) that form the base for each task:
+
+   1. `OTXTaskType`: A Python enum defining the deep learning task supported in this framework.
+   2. `OTXDataEntity`: A Python dataclass object representing a single data sample for each `OTXTaskType`.
+   3. `OTXBatchDataEntity`: A Python dataclass object representing a batch of `OTXDataEntity`.
+   4. `OTXModel`: PyTorch module (`nn.Module`) defined for each `OTXTaskType`.
+      Its `forward()` function has strict typing: 1) `OTXBatchDataEntity` as inputs and 2) `OTXBatchLossEntity` or `OTXBatchPredEntity` as outputs.
+
+   On the other hand, we can see embodiments from the abstractions (blue-colored boxes) for the detection task (`OTXTaskType.DETECTION`):
+
+   1. `DetDataEntity` and `DetBatchDataEntity`: Python dataclass objects including `bboxes` and `labels` fields for the detection task.
+      `DetBatchDataEntity` is used for the input of `OTXDetectionModel`'s `forward()` function.
+   2. `DetBatchPredEntity`: Python dataclass object including the `scores` field beyond `DetBatchDataEntity`.
+      It is used for the output of `OTXDetectionModel`'s `forward()` function at inference mode.
+
+   As observed, a developer aiming to add a new model to OTX for the detection task only needs to consider the strictly typed input and output (`DetBatchDataEntity` and `DetBatchPredEntity`).
+
+   Conversely, in the event of a new task requirement, we can incorporate it by implementing embodiments from the base, such as `OTXTaskType`, `OTXDataEntity`, `OTXBatchDataEntity`, and `OTXModel`.
+
+2. **Single Engine but Capable of Incorporating Other Public Model Training Frameworks**
+
+   _"Don't Reinvent the Wheel"_[^2]
+
+   We choose PyTorch Lightning[^3] as the primary engine for our training framework.
+   However, as you are aware, there are numerous open-source model training frameworks worldwide.
+   Often, these frameworks are tightly coupled with their own engines, hindering the reuse of valuable data pipelines or model implementations.
+   This goes against the well-known software engineering principle, _"Don't Reinvent the Wheel."_
+
+   Nevertheless, our `OTXModel` class features special abstract member functions that enable us to reuse model implementations from any framework.
+   These special functions, namely `create_model()`, `customize_inputs()` and `customize_outputs()`, facilitate the conversion of our own data entity class to the format required by a model imported from an external framework.
+
+   Let's consider an example of importing a model from MMDetection[^4]:
+
+   | ![Reuse Model](./images/product_design/reuse_model.png) |
+   | :-----------------------------------------------------: |
+   | Figure 2. Import RTMDetTiny[^5] model from MMDetection  |
+
+   In Figure 2, `MMDetCompatibleModel` implements `create_model()` to generate the `RTMDetTiny` model and registers it as a child module.
+   At this point, the `RTMDetTiny` model requires `DataSample` input from MMDetection and produces a Python dictionary for its output.
+   These data entity types do not comply with our framework.
+   To resolve this issue, `MMDetCompatibleModel` implements `customize_inputs()` and `customize_outputs()` functions to convert them into our format.
+   With this design, we can explicitly support major frameworks. We plan to continually expand the supported frameworks using this approach.
+
+3. **Support Various Data Formats without Worrying about Customizing the Input Data Pipeline**
+
+   _"Much of the time for an AI project is likely to be spent on data-related tasks ..."_[^6]
+
+   When developing a new deep learning model, one of the most tedious tasks is often creating a data pipeline for the given dataset.
+   Developers must study the data schema of the provided dataset and parse it into a data entity consumable by both the data augmentation pipeline and the model.
+
+   To simplify and enhance this process in the model development lifecycle, we introduce Datumaro[^7], a member of the OpenVINO™ open-source ecosystem.
+   Datumaro allows the import of various data formats from around the world.
+   Please refer to the following Figure.
+
+   | ![Support Various Data Format](./images/product_design/support_various_data_format.png) |
+   | :-------------------------------------------------------------------------------------: |
+   | Figure 3. Only the root directory path is provided by the user to make a data pipeline  |
+
+   In Figure 3, a popular data format such as COCO, VOC, YOLO, or Cityscape can be imported with Datumaro.
+   While Figure 3 illustrates the entire data pipeline, the only input required from the user is `data_root`.
+   This simplicity allows users to enjoy a convenient model training workflow without worrying about the input data pipeline.
+
+   Another advantage highlighted in Figure 3 is the support for multiple data augmentation pipelines.
+   As shown, we demonstrate how to incorporate a data augmentation pipeline from an external framework.
+   We provide the necessary online conversions before and after the data augmentation pipeline from the external framework.
+   As a result, by adjusting or using the configuration from the template we provide, users can utilize not only the default pipeline from TorchVision[^8], but also other pipelines such as MMPretrain, MMDetection, etc.
+   This flexibility enables developers to freely compose the model and the data augmentation pipeline from various frameworks, accelerating the creation of custom model training templates for specific problem domains.
+
+Authors: @wonjuleee @vinnamkim
 
 ### Entrypoint
 
@@ -23,3 +105,12 @@ TBD @samet-akcay @harimkang
 ### Intel Device Support
 
 TBD
+
+[^1]: Meijer, Erik, and Peter Drayton. "Static typing where possible, dynamic typing when needed: The end of the cold war between programming languages." OOPSLA, 2004.
+[^2]: https://en.wikipedia.org/wiki/Reinventing_the_wheel#In_software_development
+[^3]: https://lightning.ai/pytorch-lightning
+[^4]: https://github.com/open-mmlab/mmdetection
+[^5]: https://github.com/open-mmlab/mmdetection/tree/main/configs/rtmdet
+[^6]: https://www.netapp.com/media/16928-wp-7299.pdf
+[^7]: https://github.com/openvinotoolkit/datumaro
+[^8]: https://pytorch.org/vision/0.16/transforms.html
diff --git a/src/otx/config/base/action_detection.yaml b/src/otx/config/base/action_detection.yaml
@@ -0,0 +1,4 @@
+defaults:
+  - default
+
+task: ACTION_DETECTION
diff --git a/src/otx/config/callbacks/action_detection.yaml b/src/otx/config/callbacks/action_detection.yaml
@@ -0,0 +1,15 @@
+defaults:
+  - default
+
+model_checkpoint:
+  dirpath: ${base.output_dir}/checkpoints
+  filename: "epoch_{epoch:03d}"
+  monitor: "val/map"
+  mode: "max"
+  save_last: True
+  auto_insert_metric_name: False
+
+early_stopping:
+  monitor: "val/map"
+  patience: 100
+  mode: "max"
diff --git a/src/otx/config/data/mmaction.yaml → .../config/data/mmaction_classification.yaml b/src/otx/config/data/mmaction.yaml → .../config/data/mmaction_classification.yaml
diff --git a/src/otx/config/data/mmaction_detection.yaml b/src/otx/config/data/mmaction_detection.yaml
@@ -0,0 +1,16 @@
+defaults:
+  - default
+
+data_format: ava
+
+mem_cache_img_max_size: ${as_int_tuple:500,500}
+
+train_subset:
+  batch_size: 64
+  transform_lib_type: MMACTION
+val_subset:
+  batch_size: 64
+  transform_lib_type: MMACTION
+test_subset:
+  batch_size: 64
+  transform_lib_type: MMACTION
diff --git a/src/otx/config/model/mmaction.yaml → ...config/model/mmaction_classification.yaml b/src/otx/config/model/mmaction.yaml → ...config/model/mmaction_classification.yaml
diff --git a/src/otx/config/model/mmaction_detection.yaml b/src/otx/config/model/mmaction_detection.yaml
@@ -0,0 +1,11 @@
+defaults:
+  - default
+
+_target_: otx.core.model.module.action_detection.OTXActionDetLitModule
+
+otx_model:
+  _target_: otx.core.model.entity.action_detection.MMActionCompatibleModel
+  config: ???
+
+# compile model for faster training with pytorch 2.0
+torch_compile: false
diff --git a/src/otx/core/data/dataset/action.py → ...ore/data/dataset/action_classification.py b/src/otx/core/data/dataset/action.py → ...ore/data/dataset/action_classification.py
@@ -1,7 +1,7 @@
 # Copyright (C) 2023 Intel Corporation
 # SPDX-License-Identifier: Apache-2.0
 #
-"""Module for OTXActionDataset."""
+"""Module for OTXActionClsDataset."""
 
 from __future__ import annotations
 
@@ -11,7 +11,7 @@
 from datumaro import Label
 
 from otx.core.data.dataset.base import OTXDataset
-from otx.core.data.entity.action import ActionClsBatchDataEntity, ActionClsDataEntity
+from otx.core.data.entity.action_classification import ActionClsBatchDataEntity, ActionClsDataEntity
 from otx.core.data.entity.base import ImageInfo
 
 

diff --git a/src/otx/core/data/dataset/action_detection.py b/src/otx/core/data/dataset/action_detection.py
@@ -0,0 +1,103 @@
+# Copyright (C) 2023 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+#
+"""Module for OTXActionDetDataset."""
+
+from __future__ import annotations
+
+import pickle
+from pathlib import Path
+from typing import Callable
+
+import numpy as np
+import torch
+from datumaro import Bbox, Image
+from datumaro.components.annotation import AnnotationType
+from torchvision import tv_tensors
+
+from otx.core.data.dataset.base import OTXDataset
+from otx.core.data.entity.action_detection import ActionDetBatchDataEntity, ActionDetDataEntity
+from otx.core.data.entity.base import ImageInfo
+
+
+class OTXActionDetDataset(OTXDataset[ActionDetDataEntity]):
+    """OTXDataset class for action detection task."""
+
+    def __init__(self, **kwargs) -> None:
+        super().__init__(**kwargs)
+        self.num_classes = len(self.dm_subset.categories()[AnnotationType.label])
+
+    def _get_item_impl(self, idx: int) -> ActionDetDataEntity | None:
+        item = self.dm_subset.get(id=self.ids[idx], subset=self.dm_subset.name)
+        img = item.media_as(Image)
+        img_data, img_shape = self._get_img_data_and_shape(img)
+
+        bbox_anns = [ann for ann in item.annotations if isinstance(ann, Bbox)]
+        bboxes = (
+            np.stack([ann.points for ann in bbox_anns], axis=0).astype(np.float32)
+            if len(bbox_anns) > 0
+            else np.zeros((0, 4), dtype=np.float32)
+        )
+
+        entity = ActionDetDataEntity(
+            image=img_data,
+            img_info=ImageInfo(
+                img_idx=idx,
+                img_shape=img_shape,
+                ori_shape=img_shape,
+                pad_shape=img_shape,
+                scale_factor=(1.0, 1.0),
+            ),
+            bboxes=tv_tensors.BoundingBoxes(
+                bboxes,
+                format=tv_tensors.BoundingBoxFormat.XYXY,
+                canvas_size=img_shape,
+            ),
+            labels=torch.nn.functional.one_hot(
+                torch.as_tensor([ann.label for ann in bbox_anns]),
+                self.num_classes,
+            ).to(torch.float),
+            frame_path=item.media.path,
+            proposals=self._get_proposals(
+                item.media.path,
+                self.dm_subset.infos().get(f"{self.dm_subset.name}_proposals", None),
+            ),
+        )
+
+        return self._apply_transforms(entity)
+
+    @staticmethod
+    def _get_proposals(frame_path: str, proposal_file: str | None) -> np.ndarray:
+        """Get proposal from frame path and proposal file name.
+
+        Datumaro AVA dataset expect data structure as
+        - data_root/
+            - frames/
+                - video0
+                    - video0_0001.jpg
+                    - vdieo0_0002.jpg
+            - annotations/
+                - train.csv
+                - val.csv
+                - train.pkl
+                - val.pkl
+        """
+        if proposal_file is None:
+            return np.array([[0, 0, 1, 1]], dtype=np.float64)
+
+        annotation_dir = Path(frame_path).parent.parent.parent
+        proposal_file_path = annotation_dir / "annotations" / proposal_file
+        if not proposal_file_path.exists():
+            return np.array([[0, 0, 1, 1]], dtype=np.float64)
+        with Path.open(proposal_file_path, "rb") as f:
+            info = pickle.load(f)  # noqa: S301
+            return (
+                info[",".join(Path(frame_path).stem.rsplit("_", 1))][:, :4]
+                if ",".join(Path(frame_path).stem.rsplit("_", 1)) in info
+                else np.array([[0, 0, 1, 1]], dtype=np.float32)
+            )
+
+    @property
+    def collate_fn(self) -> Callable:
+        """Collection function to collect ActionClsDataEntity into ActionClsBatchDataEntity."""
+        return ActionDetBatchDataEntity.collate_fn
diff --git a/src/otx/core/data/entity/action.py → ...core/data/entity/action_classification.py b/src/otx/core/data/entity/action.py → ...core/data/entity/action_classification.py
diff --git a/src/otx/core/data/entity/action_detection.py b/src/otx/core/data/entity/action_detection.py
@@ -0,0 +1,87 @@
+# Copyright (C) 2023 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+#
+"""Module for OTX action data entities."""
+
+from __future__ import annotations
+
+from dataclasses import dataclass
+from typing import TYPE_CHECKING
+
+from otx.core.data.entity.base import (
+    OTXBatchDataEntity,
+    OTXBatchPredEntity,
+    OTXDataEntity,
+    OTXPredEntity,
+)
+from otx.core.data.entity.utils import register_pytree_node
+from otx.core.types.task import OTXTaskType
+
+if TYPE_CHECKING:
+    from torch import LongTensor
+    from torchvision import tv_tensors
+
+
+@register_pytree_node
+@dataclass
+class ActionDetDataEntity(OTXDataEntity):
+    """Data entity for action classification task.
+
+    Args:
+        bboxes: 2D bounding boxes for actors.
+        labels: One-hot vector of video's action labels.
+        frame_path: Data media's file path for getting proper meta information.
+        proposals: Pre-calculated actor proposals.
+    """
+
+    bboxes: tv_tensors.BoundingBoxes
+    labels: LongTensor
+    frame_path: str
+    proposals: tv_tensors.BoundingBoxes
+
+    @property
+    def task(self) -> OTXTaskType:
+        """OTX Task type definition."""
+        return OTXTaskType.ACTION_DETECTION
+
+
+@dataclass
+class ActionDetPredEntity(ActionDetDataEntity, OTXPredEntity):
+    """Data entity to represent the action classification model's output prediction."""
+
+
+@dataclass
+class ActionDetBatchDataEntity(OTXBatchDataEntity[ActionDetDataEntity]):
+    """Batch data entity for action classification.
+
+    Args:
+        bboxes(list[tv_tensors.BoundingBoxes]): A list of bounding boxes of videos.
+        labels(list[LongTensor]): A list of labels of videos.
+    """
+
+    bboxes: list[tv_tensors.BoundingBoxes]
+    labels: list[LongTensor]
+    proposals: list[tv_tensors.BoundingBoxes]
+
+    @property
+    def task(self) -> OTXTaskType:
+        """OTX task type definition."""
+        return OTXTaskType.ACTION_DETECTION
+
+    @classmethod
+    def collate_fn(cls, entities: list[ActionDetDataEntity]) -> ActionDetBatchDataEntity:
+        """Collection function to collect `ActionClsDataEntity` into `ActionClsBatchDataEntity`."""
+        batch_data = super().collate_fn(entities)
+        return ActionDetBatchDataEntity(
+            batch_size=batch_data.batch_size,
+            images=batch_data.images,
+            imgs_info=batch_data.imgs_info,
+            bboxes=[entity.bboxes for entity in entities],
+            labels=[entity.labels for entity in entities],
+            proposals=[entity.proposals for entity in entities],
+        )
+
+
+@dataclass
+class ActionDetBatchPredEntity(ActionDetBatchDataEntity, OTXBatchPredEntity):
+    """Data entity to represent model output predictions for action classification task."""
diff --git a/src/otx/core/data/factory.py b/src/otx/core/data/factory.py
@@ -122,12 +122,20 @@ def create(
             )
 
         if task == OTXTaskType.ACTION_CLASSIFICATION:
-            from .dataset.action import OTXActionClsDataset
+            from .dataset.action_classification import OTXActionClsDataset
 
             return OTXActionClsDataset(
                 dm_subset=dm_subset,
                 transforms=transforms,
                 mem_cache_img_max_size=cfg_data_module.mem_cache_img_max_size,
             )
 
+        if task == OTXTaskType.ACTION_DETECTION:
+            from .dataset.action_detection import OTXActionDetDataset
+
+            return OTXActionDetDataset(
+                dm_subset=dm_subset,
+                transforms=transforms,
+                mem_cache_img_max_size=cfg_data_module.mem_cache_img_max_size,
+            )
         raise NotImplementedError(task)