Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add explanation for XAI & minor doc fixes #1923

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .github/workflows/daily.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ jobs:
Daily-Tests:
runs-on: [self-hosted, linux, x64, dev]
timeout-minutes: 1440
if: github.ref == 'refs/heads/develop'
steps:
- name: Checkout repository
uses: actions/checkout@v3
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ Additional Features
models_optimization
hpo
auto_configuration
xai
95 changes: 95 additions & 0 deletions docs/source/guide/explanation/additional_features/xai.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
Explainable AI (XAI)
====================

**Explainable AI (XAI)** is a field of research that aims to make machine learning models more transparent and interpretable to humans.
The goal is to help users understand how and why AI systems make decisions and provide insight into their inner workings. It allows us to detect, analyze, and prevent common mistakes, for example, when the model uses irrelevant features to make a prediction.
XAI can help to build trust in AI, make sure that the model is safe for development and increase its adoption in various domains.

Most XAI methods generate **saliency maps** as a result. Saliency map is a visual representation, suitable for human comprehension, that highlights the most important parts of the image from the model point of view.
It looks like a heatmap, where warm-colored areas represent the areas with main focus.


.. figure:: ../../../../utils/images/xai_example.jpg
:width: 600
:alt: this image shows the result of XAI algorithm

These images are taken from `D-RISE paper <https://arxiv.org/abs/2006.03204>`_.


We can generate saliency maps for a certain model that was trained in OpenVINO™ Training Extensions, using ``otx explain`` command line. Learn more about its usage in :doc:`../../tutorials/base/explain` tutorial.

*********************************
XAI algorithms for classification
*********************************

.. image:: ../../../../utils/images/xai_cls.jpg
:width: 600
:align: center
:alt: this image shows the comparison of XAI classification algorithms


For classification networks these algorithms are used to generate saliency maps:

- **Activation Map​** - this is the most basic and naive approach. It takes the outputs of the model's feature extractor (backbone) and averages it in channel dimension. The results highly rely on the backbone and ignore neck and head computations. Basically, it gives a relatively good and fast result.

- `Eigen-Cam <https://arxiv.org/abs/2008.00299​>`_ uses Principal Component Analysis (PCA). It returns the first principal component of the feature extractor output, which most of the time corresponds to the dominant object. The results highly rely on the backbone as well and ignore neck and head computations.

- `Recipro-CAM​ <https://arxiv.org/pdf/2209.14074>`_ uses Class Activation Mapping (CAM) to weigh the activation map for each class, so it can generate different saliency per class. Recipro-CAM is a fast gradient-free Reciprocal CAM method. The method involves spatially masking the extracted feature maps to exploit the correlation between activation maps and network predictions for target classes.


Below we show the comparison of described algorithms. ``Access to the model internal state`` means the necessity to modify the model's outputs and dump inner features.
``Per-class explanation support`` means generation different saliency maps for different classes.

+-------------------------------------------+----------------+----------------+-------------------------------------------------------------------------+
| Classification algorithm | Activation Map | Eigen-Cam | Recipro-CAM |
+===========================================+================+================+=========================================================================+
| Need access to model internal state | Yes | Yes | Yes |
+-------------------------------------------+----------------+----------------+-------------------------------------------------------------------------+
| Gradient-free | Yes | Yes | Yes |
+-------------------------------------------+----------------+----------------+-------------------------------------------------------------------------+
| Single-shot | Yes | Yes | No (re-infer neck + head H*W times, where HxW – feature map size) |
+-------------------------------------------+----------------+----------------+-------------------------------------------------------------------------+
| Per-class explanation support | No | No | Yes |
+-------------------------------------------+----------------+----------------+-------------------------------------------------------------------------+
| Execution speed | Fast | Fast | Medium |
+-------------------------------------------+----------------+----------------+-------------------------------------------------------------------------+


****************************
XAI algorithms for detection
****************************

For detection networks these algorithms are used to generate saliency maps:

- **Activation Map​** - the same approach as for classification networks, which uses the outputs from feature extractor. This is an algorithm is used to generate saliency maps for two-stage detectors.

- **DetClassProbabilityMap** - this approach takes the raw classification head output and uses class probability maps to calculate regions of interest for each class. So, it creates different salience maps for each class. This algorithm is implemented for single-stage detectors only.

.. image:: ../../../../utils/images/xai_det.jpg
:width: 600
:align: center
:alt: this image shows the detailed description of XAI detection algorithm


The main limitation of this method is that, due to training loss design of most single-stage detectors, activation values drift towards the center of the object while propagating through the network.
This prevents from getting clear explanation in the input image space using intermediate activations.

Below we show the comparison of described algorithms. ``Access to the model internal state`` means the necessity to modify the model's outputs and dump inner features.
``Per-class explanation support`` means generation different saliency maps for different classes. ``Per-box explanation support`` means generation standalone saliency maps for each detected prediction.


+-------------------------------------------+----------------------------+--------------------------------------------+
| Detection algorithm | Activation Map | DetClassProbabilityMap |
+===========================================+============================+============================================+
| Need access to model internal state | Yes | Yes |
+-------------------------------------------+----------------------------+--------------------------------------------+
| Gradient-free | Yes | Yes |
+-------------------------------------------+----------------------------+--------------------------------------------+
| Single-shot | Yes | Yes |
+-------------------------------------------+----------------------------+--------------------------------------------+
| Per-class explanation support | No | Yes |
+-------------------------------------------+----------------------------+--------------------------------------------+
| Per-box explanation support | No | No |
+-------------------------------------------+----------------------------+--------------------------------------------+
| Execution speed | Fast | Fast |
+-------------------------------------------+----------------------------+--------------------------------------------+
Original file line number Diff line number Diff line change
Expand Up @@ -95,20 +95,25 @@ To see which public backbones are available for the task, the following command

$ otx find --backbone {torchvision, pytorchcv, mmcls, omz.mmcls}

.. In the table below the test mAP on some academic datasets using our :ref:`supervised pipeline <od_supervised_pipeline>` is presented.
.. The results were obtained on our templates without any changes.
.. For hyperparameters, please, refer to the related template.
.. We trained each model with a single Nvidia GeForce RTX3090.
In the table below the test mAP on some academic datasets using our :ref:`supervised pipeline <od_supervised_pipeline>` is presented.

.. +-----------+------------+-----------+-----------+
.. | Model name| COCO | PASCAL VOC| MinneApple|
.. +===========+============+===========+===========+
.. | YOLOX | N/A | N/A | 24.5 |
.. +-----------+------------+-----------+-----------+
.. | SSD | N/A | N/A | 31.2 |
.. +-----------+------------+-----------+-----------+
.. | ATSS | N/A | N/A | 42.5 |
.. +-----------+------------+-----------+-----------+
For `COCO <https://cocodataset.org/#home>`__ dataset the accuracy of pretrained weights is shown. That means that weights are undertrained for COCO dataset and don't achieve the best result.
That is because the purpose of pretrained models is to learn basic features from a such large and diverse dataset as COCO and to use these weights to get good results for other custom datasets right from the start.

The results on `Pascal VOC <http://host.robots.ox.ac.uk/pascal/VOC/voc2012/>`_, `BCCD <https://public.roboflow.com/object-detection/bccd/3>`_, `MinneApple <https://rsn.umn.edu/projects/orchard-monitoring/minneapple>`_ and `WGISD <https://github.com/thsant/wgisd>`_ were obtained on our templates without any changes.
BCCD is an easy dataset with focused large objects, while MinneApple and WGISD have small objects that are hard to distinguish from the background.
For hyperparameters, please, refer to the related template.
We trained each model with a single Nvidia GeForce RTX3090.

+-----------+------------+-----------+-----------+-----------+-----------+
| Model name| COCO | PASCAL VOC| BCCD | MinneApple| WGISD |
+===========+============+===========+===========+===========+===========+
| YOLOX | 32.0 | 66.6 | 60.3 | 24.5 | 44.1 |
+-----------+------------+-----------+-----------+-----------+-----------+
| SSD | 13.5 | 50.0 | 54.2 | 31.2 | 45.9 |
+-----------+------------+-----------+-----------+-----------+-----------+
| ATSS | 32.5 | 68.7 | 61.5 | 42.5 | 57.5 |
+-----------+------------+-----------+-----------+-----------+-----------+



Expand All @@ -133,7 +138,7 @@ Overall, OpenVINO™ Training Extensions utilizes powerful techniques for improv

Please, refer to the :doc:`tutorial <../../../tutorials/advanced/semi_sl>` how to train semi supervised learning.

In the table below the mAP on toy data sample from `COCO <https://cocodataset.org/#home>`_ dataset using our pipeline is presented.
In the table below the mAP on toy data sample from `COCO <https://cocodataset.org/#home>`__ dataset using our pipeline is presented.
kprokofi marked this conversation as resolved.
Show resolved Hide resolved

We sample 400 images that contain one of [person, car, bus] for labeled train images. And 4000 images for unlabeled images. For validation 100 images are selected from val2017.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -399,7 +399,7 @@ The command below will evaluate the trained model on the provided dataset:
Explanation
***********

``otx explain`` runs the explanation algorithm of a model on the specific dataset. It helps explain the model's decision-making process in a way that is easily understood by humans.
``otx explain`` runs the explainable AI (XAI) algorithm of a model on the specific dataset. It helps explain the model's decision-making process in a way that is easily understood by humans.

With the ``--help`` command, you can list additional information, such as its parameters common to all model templates:

Expand Down
2 changes: 1 addition & 1 deletion docs/source/guide/tutorials/advanced/self_sl.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ The process has been tested on the following configuration:
Setup virtual environment
*************************

1. You can follow the installation process from a :doc:`quick start guide <../../../get_started/quick_start_guide/installation>`
1. You can follow the installation process from a :doc:`quick start guide <../../get_started/quick_start_guide/installation>`
to create a universal virtual environment for OpenVINO™ Training Extensions.

2. Activate your virtual
Expand Down
4 changes: 2 additions & 2 deletions docs/source/guide/tutorials/advanced/semi_sl.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ This tutorial explains how to train a model in semi-supervised learning mode and
Setup virtual environment
*************************

1. You can follow the installation process from a :doc:`quick start guide <../../../get_started/quick_start_guide/installation>`
1. You can follow the installation process from a :doc:`quick start guide <../../get_started/quick_start_guide/installation>`
to create a universal virtual environment for OpenVINO™ Training Extensions.

2. Activate your virtual
Expand Down Expand Up @@ -128,7 +128,7 @@ Enable via ``otx train``
***************************

1. To enable semi-supervised learning directly via ``otx train``, we need to add arguments ``--unlabeled-data-roots`` and ``--algo_backend.train_type``
which is one of template-specific parameters (details are provided in `quick start guide <../../get_started/quick_start_guide/cli_commands.html#training>`__.)
which is one of template-specific parameters (details are provided in `quick start guide <../../get_started/quick_start_guide/cli_commands.html#training>`__).

.. code-block::

Expand Down
6 changes: 3 additions & 3 deletions docs/source/guide/tutorials/base/demo.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ It allows you to apply the model on the custom data or the online footage from a

This tutorial uses an object detection model for example, however for other tasks the functionality remains the same - you just need to replace the input dataset with your own.

For visualization you use images from WGISD dataset from the :doc: `object detection tutorial <how_to_train/detection>`.
For visualization you use images from WGISD dataset from the :doc:`object detection tutorial <how_to_train/detection>`.

1. Activate the virtual environment
created in the previous step.
Expand Down Expand Up @@ -69,8 +69,8 @@ You can check a list of camera devices by running the command line below on Linu

.. code-block::

sudo apt-get install v4l-utils
v4l2-ctl --list-devices
(demo) ...$ sudo apt-get install v4l-utils
(demo) ...$ v4l2-ctl --list-devices

The output will look like this:

Expand Down
23 changes: 21 additions & 2 deletions docs/source/guide/tutorials/base/explain.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,28 @@ at the path specified by ``--save-explanation-to``.

.. code-block::

otx explain --explain-data-roots otx-workspace-DETECTION/splitted_dataset/val/ --save-explanation-to outputs/explanation --load-weights outputs/weights.pth
otx explain --explain-data-roots otx-workspace-DETECTION/splitted_dataset/val/ \
--save-explanation-to outputs/explanation \
--load-weights outputs/weights.pth

3. As a result we will get a folder with a pair of generated
3. To specify the algorithm of saliency map creation for classification,
we can define the ``--explain-algorithm`` parameter.

- ``activationmap`` - for activation map classification algorithm
- ``eigencam`` - for Eigen-Cam classification algorithm
- ``classwisesaliencymap`` - for Recipro-CAM classification algorithm, this is a default method

For detection task, we can choose between the following methods:

- ``activationmap`` - for activation map detection algorithm
- ``classwisesaliencymap`` - for DetClassProbabilityMap algorithm (works for single-stage detectors only)

.. note::

Learn more about Explainable AI and its algorithms in :doc:`XAI explanation section <../../explanation/additional_features/xai>`


4. As a result we will get a folder with a pair of generated
images for each image in ``--explain-data-roots``:

- saliency map - where red color means more attention of the model
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ with the following command:
cd ..

|

.. image:: ../../../../../utils/images/flowers_example.jpg
:width: 600

Expand Down Expand Up @@ -120,7 +121,7 @@ Let's prepare an OpenVINO™ Training Extensions classification workspace runnin

(otx) ...$ cd ./otx-workspace-CLASSIFICATION

It will create **otx-workspace-CLASSIFICATION** with all necessery configs for MobileNet-V3-large-1x, prepared ``data.yaml`` to simplify CLI commands launch and splitted dataset named ``splitted_dataset``.
It will create **otx-workspace-CLASSIFICATION** with all necessary configs for MobileNet-V3-large-1x, prepared ``data.yaml`` to simplify CLI commands launch and splitted dataset named ``splitted_dataset``.

3. To start training you need to call ``otx train``
command in our workspace:
Expand Down
10 changes: 6 additions & 4 deletions docs/source/guide/tutorials/base/how_to_train/detection.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ Dataset preparation

.. code-block::

cd data
mkdir data ; cd data
git clone https://github.com/thsant/wgisd.git
cd wgisd
git checkout 6910edc5ae3aae8c20062941b1641821f0c30127
Expand Down Expand Up @@ -107,7 +107,7 @@ We can do that by running these commands:
.. code-block::

# format images folder
mkdir data images
mv data images

# format annotations folder
mv coco_annotations annotations
Expand All @@ -116,6 +116,8 @@ We can do that by running these commands:
mv annotations/train_bbox_instances.json annotations/instances_train.json
mv annotations/test_bbox_instances.json annotations/instances_val.json

cd ../..

*********
Training
*********
Expand Down Expand Up @@ -183,9 +185,9 @@ Let's prepare the object detection workspace running the following command:



.. note::
.. warning::

If you want to update your current workspace by running ``otx build`` with other parameters, it's better to delete the original workplace before that to prevent mistakes.
If you want to rebuild your current workspace by running ``otx build`` with other parameters, it's better to delete the original workplace before that to prevent mistakes.

Check ``otx-workspace-DETECTION/data.yaml`` to ensure, which data subsets will be used for training and validation, and update it if necessary.

Expand Down
1 change: 1 addition & 0 deletions docs/source/guide/tutorials/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ This section reveals how to use ``CLI``, both base and advanced features.
It provides the end-to-end solution from installation to model deployment and demo visualization on specific example for each of the supported tasks.

.. toctree::
:titlesonly:
:maxdepth: 3

base/index
Expand Down
Binary file added docs/utils/images/xai_cls.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/utils/images/xai_det.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/utils/images/xai_example.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Description.
model_template_id: Custom_Action_Classificaiton_MoViNet
model_template_id: Custom_Action_Classification_MoViNet
name: MoViNet
task_type: ACTION_CLASSIFICATION
task_family: VISION
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Description.
model_template_id: Custom_Action_Classificaiton_X3D
model_template_id: Custom_Action_Classification_X3D
name: X3D
task_type: ACTION_CLASSIFICATION
task_family: VISION
Expand Down
Loading