Skip to content

Commit

Permalink
add docs
Browse files Browse the repository at this point in the history
  • Loading branch information
eugene123tw committed Mar 26, 2024
1 parent 2277ec5 commit cbf9297
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 13 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,11 @@ Instance segmentation is commonly used in applications such as self-driving cars

|
We solve this problem in the `MaskRCNN <https://arxiv.org/abs/1703.06870>`_ manner. The main idea of Mask R-CNN is to add a branch for predicting an object mask in parallel with the existing branch for bounding box regression and object classification.
We integrate two prominent instance segmentation models within OpenVINO™ Training Extensions:: `Mask R-CNN <https://arxiv.org/abs/1703.06870>`_ and `RTMDet <https://arxiv.org/abs/2212.07784>`_.

This is done by using a fully convolutional network (FCN) on top of the feature map generated by the last convolutional layer of the backbone network. The model first generates region proposals, and then uses a RoIAlign layer to align the region proposals with the feature map, then the FCN predicts the class and box offset for each proposal and the mask for each class.
Mask R-CNN, a widely adopted method, builds upon the Faster R-CNN architecture, known for its two-stage object detection mechanism. In the initial stage, it proposes regions of interest, while in the subsequent stage, it predicts the class and bounding box offsets for each proposal. Distinguishing itself, Mask R-CNN incorporates an additional branch dedicated to predicting object masks concurrently with the existing branches for bounding box regression and object classification.

The mask branch is trained to predict a binary mask for each object instance, where the mask is aligned with the object's bounding box and has the same size as the region of interest (RoI). The predicted mask is then used to segment the object from the background.
On the other hand, RTMDet leverages the architecture of `RTMNet <https://arxiv.org/abs/2212.07784>`_, a lightweight, one-stage model designed for both object detection and instance segmentation tasks. RTMNet prioritizes efficiency, making it particularly suitable for **real-time applications**. RTMDet-Inst extends the capabilities of RTMNet to encompass instance segmentation by integrating a mask prediction branch.


For the supervised training we use the following algorithms components:
Expand Down Expand Up @@ -53,11 +53,15 @@ Models
We support the following ready-to-use model templates:

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------------------+-----------------+
| Template ID | Name | Complexity (GFLOPs) | Model size (MB) |
| Model Recipe | Name | Complexity (GFLOPs) | Model size (MB) |
+===============================================================================================================================================================================================================+============================+=====================+=================+
| `Custom_Counting_Instance_Segmentation_MaskRCNN_EfficientNetB2B <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/recipe/instance_segmentation/maskrcnn_efficientnetb2b.yaml>`_ | MaskRCNN-EfficientNetB2B | 68.48 | 13.27 |
| `Instance Segmentation MaskRCNN EfficientNetB2B <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/recipe/instance_segmentation/maskrcnn_efficientnetb2b.yaml>`_ | MaskRCNN-EfficientNetB2B | 68.48 | 13.27 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------------------+-----------------+
| `Custom_Counting_Instance_Segmentation_MaskRCNN_ResNet50 <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/recipe/instance_segmentation/maskrcnn_r50.yaml>`_ | MaskRCNN-ResNet50 | 533.80 | 177.90 |
| `Instance Segmentation MaskRCNN ResNet50 <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/recipe/instance_segmentation/maskrcnn_r50.yaml>`_ | MaskRCNN-ResNet50 | 533.80 | 177.90 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------------------+-----------------+
| `Instance Segmentation MaskRCNN SwinT <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/recipe/instance_segmentation/maskrcnn_swint.yaml>`_ | MaskRCNN-SwinT | | |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------------------+-----------------+
| `Instance Segmentation RTMDet-Inst Tiny <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/recipe/instance_segmentation/rtmdet_inst_tiny.yaml>`_ | RTMDet-Ins-tiny | | |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------------------+-----------------+

Above table can be found using the following command
Expand All @@ -66,11 +70,15 @@ Above table can be found using the following command
(otx) ...$ otx find --task INSTANCE_SEGMENTATION
MaskRCNN-ResNet50 utilizes the `ResNet-50 <https://arxiv.org/abs/1512.03385>`_ architecture as the backbone network for extracting image features. This choice of backbone network results in a higher number of parameters and FLOPs, which consequently requires more training time. However, the model offers superior performance in terms of accuracy.
MaskRCNN-SwinT leverages `Swin Transformer <https://arxiv.org/abs/2103.14030>`_ architecture as its backbone network for feature extraction. This choice, while yielding superior accuracy, comes with a longer training time and higher computational requirements.

In contrast, the MaskRCNN-ResNet50 model adopts the more conventional ResNet-50 backbone network, striking a balance between accuracy and computational efficiency.

On the other hand, MaskRCNN-EfficientNetB2B employs the `EfficientNet-B2 <https://arxiv.org/abs/1905.11946>`_ architecture as the backbone network. This selection strikes a balance between accuracy and speed, making it a preferable option when prioritizing training time and computational cost.
Meanwhile, MaskRCNN-EfficientNetB2B employs `EfficientNet-B2 <https://arxiv.org/abs/1905.11946>`_ architecture as its backbone, offering a compromise between accuracy and speed during training, making it a favorable option when minimizing training time and computational resources is essential.

Recently, we have made updates to MaskRCNN-ConvNeXt, incorporating the `ConvNeXt backbone <https://arxiv.org/abs/2201.03545>`_. Through our experiments, we have observed that this variant achieves better accuracy compared to MaskRCNN-ResNet50 while utilizing less GPU memory. However, it is important to note that the training time and inference duration may slightly increase. If minimizing training time is a significant concern, we recommend considering a switch to MaskRCNN-EfficientNetB2B.
Recently, we have updated RTMDet-Ins-tiny, integrating works from `RTMNet <https://arxiv.org/abs/2212.07784>`_ to prioritize real-time instance segmentation inference. While this model is tailored for real-time applications due to its lightweight design, it may not achieve the same level of accuracy as its counterparts, potentially necessitating more extensive training data.

Our experiments indicate that MaskRCNN-SwinT and MaskRCNN-ResNet50 outperform MaskRCNN-EfficientNetB2B in terms of accuracy. However, if reducing training time is paramount, transitioning to MaskRCNN-EfficientNetB2B is recommended. Conversely, for applications where inference speed is crucial, RTMDet-Ins-tiny presents an optimal solution.

In the table below the `mAP <https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient>`_ metric on some academic datasets using our :ref:`supervised pipeline <instance_segmentation_supervised_pipeline>` is presented. The results were obtained on our templates without any changes. We use 1024x1024 image resolution, for other hyperparameters, please, refer to the related template. We trained each model with single Nvidia GeForce RTX3090.

Expand All @@ -81,3 +89,7 @@ In the table below the `mAP <https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%9
+---------------------------+--------------+------------+-----------------+
| MaskRCNN-ResNet50 | N/A | N/A | N/A |
+---------------------------+--------------+------------+-----------------+
| MaskRCNN-SwinT | N/A | N/A | N/A |
+---------------------------+--------------+------------+-----------------+
| RTMDet-Ins-tiny | N/A | N/A | N/A |
+---------------------------+--------------+------------+-----------------+
Original file line number Diff line number Diff line change
Expand Up @@ -47,13 +47,14 @@ Dataset preparation


1. Clone a repository with
`car-seg dataset <https://universe.roboflow.com/gianmarco-russo-vt9xr/car-seg-un1pm>`_.
`WGISD dataset <https://github.com/thsant/wgisd>`_.

.. code-block:: shell
.. code-block::
mkdir data ; cd data
wget https://ultralytics.com/assets/carparts-seg.zip
unzip carparts-seg.zip
git clone https://github.com/thsant/wgisd.git
cd wgisd
git checkout 6910edc5ae3aae8c20062941b1641821f0c30127
This dataset contains images of grapevines with the annotation for different varieties of grapes.
Expand Down

0 comments on commit cbf9297

Please sign in to comment.