Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update segmentation documentation #3425

Merged
merged 13 commits into from
May 3, 2024
Original file line number Diff line number Diff line change
Expand Up @@ -14,16 +14,17 @@ The output of semantic segmentation is typically an image where each pixel is co

|

We solve this task by utilizing `FCN Head <https://arxiv.org/pdf/1411.4038.pdf>`_ with implementation from `MMSegmentation <https://mmsegmentation.readthedocs.io/en/latest/_modules/mmseg/models/decode_heads/fcn_head.html>`_ on the multi-level image features obtained by the feature extractor backbone (`Lite-HRNet <https://arxiv.org/abs/2104.06403>`_).
We solve this task by utilizing segmentation decoder heads on the multi-level image features obtained by the feature extractor backbone.
For the supervised training we use the following algorithms components:

.. _semantic_segmentation_supervised_pipeline:

- ``Augmentations``: Besides basic augmentations like random flip, random rotate and random crop, we use mixing images technique with different `photometric distortions <https://mmsegmentation.readthedocs.io/en/latest/api.html#mmseg.datasets.pipelines.PhotoMetricDistortion>`_.

- ``Optimizer``: We use `Adam <https://arxiv.org/abs/1412.6980>`_ optimizer with weight decay set to zero and gradient clipping with maximum quadratic norm equals to 40.
- ``Optimizer``: We use `Adam <https://arxiv.org/abs/1412.6980>`_ and `AdamW <https://arxiv.org/abs/1711.05101>` optimizers.

- ``Learning rate schedule``: For scheduling training process we use **ReduceLROnPlateau** with linear learning rate warmup for 100 iterations. This method monitors a target metric (in our case we use metric on the validation set) and if no improvement is seen for a ``patience`` number of epochs, the learning rate is reduced.
- ``Learning rate schedule``: For scheduling training process we use **ReduceLROnPlateau** with linear learning rate warmup for 100 iterations for `Lite-HRNet <https://arxiv.org/abs/2104.06403>`_ family. This method monitors a target metric (in our case we use metric on the validation set) and if no improvement is seen for a ``patience`` number of epochs, the learning rate is reduced.
For `SegNext <https://arxiv.org/abs/2209.08575>`_ and `DinoV2 <https://arxiv.org/abs/2304.07193>`_ models we use `PolynomialLR <https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.PolynomialLR.html>`_ scheduler.

- ``Loss function``: We use standard `Cross Entropy Loss <https://en.wikipedia.org/wiki/Cross_entropy>`_ to train a model.

Expand All @@ -39,14 +40,6 @@ For the dataset handling inside OpenVINO™ Training Extensions, we use `Dataset
At this end we support `Common Semantic Segmentation <https://github.com/openvinotoolkit/datumaro/blob/develop/docs/source/docs/data-formats/formats/common_semantic_segmentation.md>`_ data format.
If you organized supported dataset format, starting training will be very simple. We just need to pass a path to the root folder and desired model recipe to start training:

.. note::

Due to some internal limitations, the dataset should always consist of a "background" label. If your dataset doesn't have a background label, rename the first label to "background" in the ``meta.json`` file.


.. note::

Currently, metrics with models trained with our OTX dataset adapter can differ from popular benchmarks. To avoid this and train the model on exactly the same segmentation masks as intended by the authors, please, set the parameter ``use_otx_adapter`` to ``False``.

******
Models
Expand All @@ -55,31 +48,32 @@ Models

We support the following ready-to-use model recipes:

+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------+---------------------+-----------------+
| Recipe ID | Name | Complexity (GFLOPs) | Model size (MB) |
+======================================================================================================================================================================================+========================+=====================+=================+
| `Custom_Semantic_Segmentation_Lite-HRNet-s-mod2_OCR <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/recipe/semantic_segmentation/litehrnet_s.yaml>`_ | Lite-HRNet-s-mod2 | 1.44 | 3.2 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------+---------------------+-----------------+
| `Custom_Semantic_Segmentation_Lite-HRNet-18-mod2_OCR <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/recipe/semantic_segmentation/litehrnet_18.yaml>`_ | Lite-HRNet-18-mod2 | 2.82 | 4.3 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------+---------------------+-----------------+
| `Custom_Semantic_Segmentation_Lite-HRNet-x-mod3_OCR <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/recipe/semantic_segmentation/litehrnet_x.yaml>`_ | Lite-HRNet-x-mod3 | 9.20 | 5.7 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------+---------------------+-----------------+
| `Custom_Semantic_Segmentation_SegNext_T <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/recipe/semantic_segmentation/segnext_t.yaml>`_ | SegNext-t | 6.07 | 4.23 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------+---------------------+-----------------+
| `Custom_Semantic_Segmentation_SegNext_S <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/recipe/semantic_segmentation/segnext_s.yaml>`_ | SegNext-s | 15.35 | 13.9 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------+---------------------+-----------------+
| `Custom_Semantic_Segmentation_SegNext_B <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/recipe/semantic_segmentation/segnext_b.yaml>`_ | SegNext-b | 32.08 | 27.56 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------+---------------------+-----------------+

All of these models are members of the same `Lite-HRNet <https://arxiv.org/abs/2104.06403>`_ backbones family. They differ in the trade-off between accuracy and inference/training speed. ``Lite-HRNet-x-mod3`` is the recipe with heavy-size architecture for accurate predictions but it requires long training.
Whereas the ``Lite-HRNet-s-mod2`` is the lightweight architecture for fast inference and training. It is the best choice for the scenario of a limited amount of data. The ``Lite-HRNet-18-mod2`` model is the middle-sized architecture for the balance between fast inference and training time.

Use `SegNext <https://arxiv.org/abs/2209.08575>`_ model which can achieve superior perfomance while preserving fast inference and fast training.

In the table below the `Dice score <https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient>`_ on some academic datasets using our :ref:`supervised pipeline <semantic_segmentation_supervised_pipeline>` is presented. We use 512x512 image crop resolution, for other hyperparameters, please, refer to the related recipe. We trained each model with single Nvidia GeForce RTX3090.
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+-----------------+-----------------+-----------------+
| Recipe Path | Complexity (GFLOPs) | Model size (M) | FPS (GPU) | iter time (sec) |
+======================================================================================================================================================================================+=====================+=================+=================+=================+
| `Lite-HRNet-s-mod2 <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/recipe/semantic_segmentation/litehrnet_s.yaml>`_ | 1.44 | 3.2 | 37.68 | 0.151 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+-----------------+-----------------+-----------------+
| `Lite-HRNet-18-mod2 <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/recipe/semantic_segmentation/litehrnet_18.yaml>`_ | 2.63 | 4.3 | 31.17 | 0.176 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+-----------------+-----------------+-----------------+
| `Lite-HRNet-x-mod3 <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/recipe/semantic_segmentation/litehrnet_x.yaml>`_ | 9.20 | 5.7 | 15.07 | 0.347 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+-----------------+-----------------+-----------------+
| `SegNext_T <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/recipe/semantic_segmentation/segnext_t.yaml>`_ | 6.07 | 4.23 | 104.90 | 0.126 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+-----------------+-----------------+-----------------+
| `SegNext_S <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/recipe/semantic_segmentation/segnext_s.yaml>`_ | 15.35 | 13.9 | 85.67 | 0.134 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+-----------------+-----------------+-----------------+
| `SegNext_B <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/recipe/semantic_segmentation/segnext_b.yaml>`_ | 32.08 | 27.56 | 61.91 | 0.215 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+-----------------+-----------------+-----------------+
| `DinoV2 <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/recipe/semantic_segmentation/dino_v2.yaml>`_ | 124.01 | 24.4 | 3.52 | 0.116 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+-----------------+-----------------+-----------------+

All of these models differ in the trade-off between accuracy and inference/training speed. For example, ``SegNext_B`` is the recipe with heavy-size architecture for more accurate predictions, but it requires longer training.
Whereas the ``Lite-HRNet-s-mod2`` is the lightweight architecture for fast inference and training. It is the best choice for the scenario of a limited amount of data. The ``Lite-HRNet-18-mod2`` and ``SegNext_S`` models are the middle-sized architectures for the balance between fast inference and training time.
``DinoV2`` is the state-of-the-art model producing universal features suitable for all image-level and pixel-level visual tasks. This model doesn't require fine-tuning of the whole backbone, but only segmentation decode head. Because of that, it provides faster training preserving high accuracy.

In the table below the `Dice score <https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient>`_ on some academic datasets using our :ref:`supervised pipeline <semantic_segmentation_supervised_pipeline>` is presented. We use 512x512 (560x560 fot DinoV2) image crop resolution, for other hyperparameters, please, refer to the related recipe. We trained each model with single Nvidia GeForce RTX3090.

+-----------------------+--------------------------------------------------------------+-----------------------------------------------------+----------------------------------------------------------------------+-----------------------------------------------------------------+--------+
| Model name | `DIS5K <https://xuebinqin.github.io/dis/index.html>`_ | `Cityscapes <https://www.cityscapes-dataset.com/>`_ | `Pascal-VOC 2012 <http://host.robots.ox.ac.uk/pascal/VOC/voc2012/>`_ | `KITTI full <https://www.cvlibs.net/datasets/kitti/index.php>`_ | Mean |
| Model name | `DIS5K <https://xuebinqin.github.io/dis/index.html>`_ | `Cityscapes <https://www.cityscapes-dataset.com/>`_ | `Pascal-VOC 2012 <http://host.robots.ox.ac.uk/pascal/VOC/voc2012/>`_ | `KITTI <https://www.cvlibs.net/datasets/kitti/index.php>`_ | Mean |
+=======================+==============================================================+=====================================================+======================================================================+=================================================================+========+
| Lite-HRNet-s-mod2 | 79.95 | 62.38 | 58.26 | 36.06 | 59.16 |
+-----------------------+--------------------------------------------------------------+-----------------------------------------------------+----------------------------------------------------------------------+-----------------------------------------------------------------+--------+
Expand All @@ -93,6 +87,8 @@ In the table below the `Dice score <https://en.wikipedia.org/wiki/S%C3%B8rensen%
+-----------------------+--------------------------------------------------------------+-----------------------------------------------------+----------------------------------------------------------------------+-----------------------------------------------------------------+--------+
| SegNext-b | 87.92 | 76.94 | 85.01 | 55.49 | 73.45 |
+-----------------------+--------------------------------------------------------------+-----------------------------------------------------+----------------------------------------------------------------------+-----------------------------------------------------------------+--------+
| DinoV2 | 87.92 | 76.94 | 85.01 | 55.49 | 73.45 |
+-----------------------+--------------------------------------------------------------+-----------------------------------------------------+----------------------------------------------------------------------+-----------------------------------------------------------------+--------+

.. note::

Expand Down
2 changes: 1 addition & 1 deletion src/otx/algo/segmentation/base_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ def forward(
- Otherwise, returns the model outputs after interpolation.
"""
enc_feats = self.backbone(inputs)
outputs = self.decode_head(enc_feats)
outputs = self.decode_head(inputs=enc_feats)

if mode == "tensor":
return outputs
Expand Down
20 changes: 20 additions & 0 deletions src/otx/algo/segmentation/litehrnet.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,13 @@

from typing import TYPE_CHECKING, Any, ClassVar

from torch.onnx import OperatorExportTypes

from otx.algo.segmentation.backbones import LiteHRNet
from otx.algo.segmentation.heads import FCNHead
from otx.algo.utils.support_otx_v1 import OTXv1Helper
from otx.core.exporter.base import OTXModelExporter
from otx.core.exporter.native import OTXNativeModelExporter
from otx.core.model.segmentation import TorchVisionCompatibleModel

from .base_model import BaseSegmModel
Expand Down Expand Up @@ -550,3 +554,19 @@
}
optim_config.update(ignored_scope)
return optim_config

@property
def _exporter(self) -> OTXModelExporter:
"""Creates OTXModelExporter object that can export the model."""
return OTXNativeModelExporter(

Check warning on line 561 in src/otx/algo/segmentation/litehrnet.py

View check run for this annotation

Codecov / codecov/patch

src/otx/algo/segmentation/litehrnet.py#L561

Added line #L561 was not covered by tests
task_level_export_parameters=self._export_parameters,
input_size=self.image_size,
mean=self.mean,
std=self.scale,
resize_mode="standard",
pad_value=0,
swap_rgb=False,
via_onnx=False,
onnx_export_configuration={"operator_export_type": OperatorExportTypes.ONNX_ATEN_FALLBACK},
output_names=None,
)
Loading
Loading