openvinotoolkit · goodsong81 · Mar 23, 2023 · Mar 17, 2023 · Mar 20, 2023 · Feb 28, 2023
@@ -10,7 +10,6 @@ jobs:
   Daily-Tests:
     runs-on: [self-hosted, linux, x64, dev]
     timeout-minutes: 1440
-    if: github.ref == 'refs/heads/develop'
     steps:
       - name: Checkout repository
         uses: actions/checkout@v3

@@ -9,3 +9,4 @@ Additional Features
    models_optimization
    hpo
    auto_configuration
+   xai
@@ -0,0 +1,83 @@
+Explainable AI (XAI)
+====================
+
+**Explainable AI (XAI)** is a field of research that aims to make machine learning models more transparent and interpretable to humans.
+The goal is to help users understand how and why AI systems make decisions and provide insight into their inner workings. It allows us to detect, analyze, and prevent common mistakes like the lack of data diversity for certain objects. 
+XAI can help to build trust in AI, make sure that the model is safe for development and increase its adoption in various domains.
+
+Most XAI tools generate **saliency maps** as a part of the process. It is a visual representation, suitable for human comprehension, that highlights the most important parts of the image that the network has focused on the most. 
+It looks like a heatmap, where warm-colored areas represent the areas with main focuses.
+
+
+.. image:: ../../../../utils/images/xai_example.jpg
+  :width: 600
+  :alt: this image shows the result of XAI algorithm
+
+
+We can generate saliency maps for a certain model that was trained in OpenVINO™ Training Extensions, using ``otx explain`` command line. Learn more about its usage in  :doc:`../../tutorials/base/explain` tutorial.
+
+*************************
+Classification algorithms
+*************************
+
+.. image:: ../../../../utils/images/xai_cls.jpg
+  :width: 600
+  :alt: this image shows the comparison of XAI classification algorithms
+
+
+For classification networks these algorithms are used to generate saliency maps:
+
+- **Activation Map** - this is the most basic and naive approach. It takes the outputs of the model's feature extractor (backbone) and averages it in channel dimension. The results highly rely on the backbone and ignore neck and head computations. Basically, it gives a relatively good and fast result.
+
+- `Eigen-Cam <https://arxiv.org/abs/2008.00299>`_ uses Principal Component Analysis (PCA).  It returns the first principal component of the feature extractor output, which most of the time corresponds to the dominant object. The results highly rely on the backbone as well and ignore neck and head computations.
+
+- `Recipro-CAM <https://arxiv.org/pdf/2209.14074>`_ uses Class Activation Mapping (CAM) to weigh the activation map for each class, so it can generate different saliency per class. Recipro-CAM is a fast gradient-free Reciprocal CAM method. The method involves spatially masking the extracted feature maps to exploit the correlation between activation maps and network predictions for target classes. 
+
+
+Below we show the comparison of described algorithms:
+
++-------------------------------------------+----------------+----------------+-------------------------------------------------------------------------+
+| Classification algorithm                  | Activation Map | Eigen-Cam      | Recipro-CAM                                                             |
++===========================================+================+================+=========================================================================+
+| Need access to model internal state       | Yes            | Yes            |  Yes                                                                    |
++-------------------------------------------+----------------+----------------+-------------------------------------------------------------------------+
+| Gradient-free                             | Yes            | Yes            |  Yes                                                                    |
++-------------------------------------------+----------------+----------------+-------------------------------------------------------------------------+
+| Single-shot                               | Yes            | Yes            |  No (re-infer neck + head H*W times, where HxW – feature map size)      |                                                          
++-------------------------------------------+----------------+----------------+-------------------------------------------------------------------------+
+| Class discrimination                      | No             | No             | Yes                                                                     |
++-------------------------------------------+----------------+----------------+-------------------------------------------------------------------------+
+| Execution speed                           | Fast           | Fast           | Medium                                                                  |  
++-------------------------------------------+----------------+----------------+-------------------------------------------------------------------------+
+
+
+*************************
+Detection algorithms
+*************************
+
+To generate a saliency map for the detection task, we use the **DetClassProbabilityMap** algorithm.
+It's the naive approach for detection that takes the raw classification head output and uses class probability maps to calculate regions of interest for each class. So, it creates different salience maps for each class.
+For now, this algorithm is implemented for single-stage detectors only.
+
+.. image:: ../../../../utils/images/xai_det.jpg
+  :width: 600
+  :alt: this image shows the detailed description of XAI detection algorithm
+
+
+The main limitation of this method is that, due to training loss design, activation values drift towards the center of the object. It limits the getting of clear explanations in the near-edge image areas.
+
++-------------------------------------------+-------------------------------------------------------------------------+
+| Detection algorithm                       | DetClassProbabilityMap                                                  |
++===========================================+=========================================================================+
+| Need access to model internal state       | Yes                                                                     |           
++-------------------------------------------+-------------------------------------------------------------------------+
+| Gradient-free                             | Yes                                                                     |         
++-------------------------------------------+-------------------------------------------------------------------------+
+| Single-shot                               | Yes                                                                     |         
++-------------------------------------------+-------------------------------------------------------------------------+
+| Class discrimination                      | No                                                                      |          
++-------------------------------------------+-------------------------------------------------------------------------+
+| Box discrimination                        | No                                                                      |          
++-------------------------------------------+-------------------------------------------------------------------------+
+| Execution speed                           | Fast                                                                    |           
++-------------------------------------------+-------------------------------------------------------------------------+
@@ -399,7 +399,7 @@ The command below will evaluate the trained model on the provided dataset:
 Explanation
 ***********
 
-``otx explain`` runs the explanation algorithm of a model on the specific dataset. It helps explain the model's decision-making process in a way that is easily understood by humans.
+``otx explain`` runs the explainable AI (XAI) algorithm of a model on the specific dataset. It helps explain the model's decision-making process in a way that is easily understood by humans.
 
 With the ``--help`` command, you can list additional information, such as its parameters common to all model templates:
 

@@ -21,7 +21,7 @@ The process has been tested on the following configuration:
 Setup virtual environment
 *************************
 
-1. You can follow the installation process from a :doc:`quick start guide <../../../get_started/quick_start_guide/installation>` 
+1. You can follow the installation process from a :doc:`quick start guide <../../get_started/quick_start_guide/installation>` 
 to create a universal virtual environment for OpenVINO™ Training Extensions.
 
 2. Activate your virtual 

@@ -44,7 +44,7 @@ This tutorial explains how to train a model in semi-supervised learning mode and
 Setup virtual environment
 *************************
 
-1. You can follow the installation process from a :doc:`quick start guide <../../../get_started/quick_start_guide/installation>` 
+1. You can follow the installation process from a :doc:`quick start guide <../../get_started/quick_start_guide/installation>` 
 to create a universal virtual environment for OpenVINO™ Training Extensions.
 
 2. Activate your virtual 
@@ -128,7 +128,7 @@ Enable via ``otx train``
 ***************************
 
 1. To enable semi-supervised learning directly via ``otx train``, we need to add arguments ``--unlabeled-data-roots`` and ``--algo_backend.train_type`` 
-which is one of template-specific parameters (details are provided in `quick start guide <../../get_started/quick_start_guide/cli_commands.html#training>`__.)
+which is one of template-specific parameters (details are provided in `quick start guide <../../get_started/quick_start_guide/cli_commands.html#training>`__).
 
 .. code-block::
 

@@ -8,7 +8,7 @@ It allows you to apply the model on the custom data or the online footage from a
 
     This tutorial uses an object detection model for example, however for other tasks the functionality remains the same - you just need to replace the input dataset with your own.
 
-For visualization you use images from WGISD dataset from the :doc: `object detection tutorial <how_to_train/detection>`.
+For visualization you use images from WGISD dataset from the :doc:`object detection tutorial <how_to_train/detection>`.
 
 1. Activate the virtual environment 
 created in the previous step.
@@ -69,8 +69,8 @@ You can check a list of camera devices by running the command line below on Linu
 
 .. code-block::
 
-    sudo apt-get install v4l-utils
-    v4l2-ctl --list-devices
+    (demo) ...$ sudo apt-get install v4l-utils
+    (demo) ...$ v4l2-ctl --list-devices
 
 The output will look like this:
 

@@ -26,9 +26,25 @@ at the path specified by ``--save-explanation-to``.
 
 .. code-block::
 
-    otx explain --explain-data-roots otx-workspace-DETECTION/splitted_dataset/val/ --save-explanation-to outputs/explanation --load-weights outputs/weights.pth
+    otx explain --explain-data-roots otx-workspace-DETECTION/splitted_dataset/val/ \
+                --save-explanation-to outputs/explanation \
+                --load-weights outputs/weights.pth
 
-3. As a result we will get a folder with a pair of generated 
+3. To specify the algorithm of saliency map creation for classification, 
+we can define the ``--explain-algorithm`` parameter.
+
+- ``activationmap`` - for activation map classification algorithm 
+- ``eigencam`` -  for Eigen-Cam classification algorithm
+- ``classwisesaliencymap`` -  for Recipro-CAM classification algorithm, this is a default method
+
+For detection task, the ``classwisesaliencymap`` is only supported, so we don't need to specify it.
+
+.. note::
+
+  Learn more about Explainable AI and its algorithms in :doc:`XAI explanation section <../../explanation/additional_features/xai>`
+
+
+4. As a result we will get a folder with a pair of generated 
 images for each image in ``--explain-data-roots``: 
 
 - saliency map - where red color means more attention of the model

@@ -56,6 +56,7 @@ with the following command:
   cd ..
 
 |
+
 .. image:: ../../../../../utils/images/flowers_example.jpg
   :width: 600
 
@@ -120,7 +121,7 @@ Let's prepare an OpenVINO™ Training Extensions classification workspace runnin
 
   (otx) ...$ cd ./otx-workspace-CLASSIFICATION
 
-It will create **otx-workspace-CLASSIFICATION** with all necessery configs for MobileNet-V3-large-1x, prepared ``data.yaml`` to simplify CLI commands launch and splitted dataset named ``splitted_dataset``.
+It will create **otx-workspace-CLASSIFICATION** with all necessary configs for MobileNet-V3-large-1x, prepared ``data.yaml`` to simplify CLI commands launch and splitted dataset named ``splitted_dataset``.
 
 3. To start training you need to call ``otx train``
 command in our workspace:

@@ -6,6 +6,7 @@ This section reveals how to use ``CLI``, both base and advanced features.
 It provides the end-to-end solution from installation to model deployment and demo visualization on specific example for each of the supported tasks.
 
 .. toctree::
+   :titlesonly:
    :maxdepth: 3
 
    base/index

@@ -1,5 +1,5 @@
 # Description.
-model_template_id: Custom_Action_Classificaiton_MoViNet
+model_template_id: Custom_Action_Classification_MoViNet
 name: MoViNet
 task_type: ACTION_CLASSIFICATION
 task_family: VISION

@@ -1,5 +1,5 @@
 # Description.
-model_template_id: Custom_Action_Classificaiton_X3D
+model_template_id: Custom_Action_Classification_X3D
 name: X3D
 task_type: ACTION_CLASSIFICATION
 task_family: VISION

@@ -172,18 +172,16 @@ def loss_single(
             pos_centerness = centerness[pos_inds]
 
             centerness_targets = self.centerness_target(pos_anchors, pos_bbox_targets)
-            pos_decode_bbox_pred = self.bbox_coder.decode(pos_anchors, pos_bbox_pred)
-            pos_decode_bbox_targets = self.bbox_coder.decode(pos_anchors, pos_bbox_targets)
+            if self.reg_decoded_bbox:
+                pos_bbox_pred = self.bbox_coder.decode(pos_anchors, pos_bbox_pred)
 
             if self.use_qfl:
-                quality[pos_inds] = bbox_overlaps(
-                    pos_decode_bbox_pred.detach(), pos_decode_bbox_targets, is_aligned=True
-                ).clamp(min=1e-6)
+                quality[pos_inds] = bbox_overlaps(pos_bbox_pred.detach(), pos_bbox_targets, is_aligned=True).clamp(
+                    min=1e-6
+                )
 
             # regression loss
-            loss_bbox = self.loss_bbox(
-                pos_decode_bbox_pred, pos_decode_bbox_targets, weight=centerness_targets, avg_factor=1.0
-            )
+            loss_bbox = self.loss_bbox(pos_bbox_pred, pos_bbox_targets, weight=centerness_targets, avg_factor=1.0)
 
             # centerness loss
             loss_centerness = self.loss_centerness(pos_centerness, centerness_targets, avg_factor=num_total_samples)

@@ -26,7 +26,7 @@
     "INSTANCE_SEGMENTATION": "Custom_Counting_Instance_Segmentation_MaskRCNN_ResNet50",
     "ROTATED_DETECTION": "Custom_Rotated_Detection_via_Instance_Segmentation_MaskRCNN_ResNet50",
     "SEGMENTATION": "Custom_Semantic_Segmentation_Lite-HRNet-18-mod2_OCR",
-    "ACTION_CLASSIFICATION": "Custom_Action_Classificaiton_X3D",
+    "ACTION_CLASSIFICATION": "Custom_Action_Classification_X3D",
     "ACTION_DETECTION": "Custom_Action_Detection_X3D_FAST_RCNN",
     "ANOMALY_CLASSIFICATION": "ote_anomaly_classification_padim",
     "ANOMALY_DETECTION": "ote_anomaly_detection_padim",
-Original file line number
+Diff line change
@@ Expand Up / @@ -9,3 +9,4 @@ Additional Features @@
        models_optimization
        hpo
        auto_configuration
+       xai