Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
Maxim Zhiltsov committed Sep 4, 2020
1 parent ff3f597 commit 0bf5d43
Show file tree
Hide file tree
Showing 179 changed files with 20,410 additions and 0 deletions.
75 changes: 75 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
## Table of Contents

- [Installation](#installation)
- [Usage](#usage)
- [Testing](#testing)
- [Design](#design-and-code-structure)

## Installation

### Prerequisites

- Python (3.5+)
- OpenVINO (optional)

``` bash
git clone https://github.com/opencv/cvat
```

Optionally, install a virtual environment:

``` bash
python -m pip install virtualenv
python -m virtualenv venv
. venv/bin/activate
```

Then install all dependencies:

``` bash
while read -r p; do pip install $p; done < requirements.txt
```

If you're working inside CVAT environment:
``` bash
. .env/bin/activate
while read -r p; do pip install $p; done < datumaro/requirements.txt
```

## Usage

> The directory containing Datumaro should be in the `PYTHONPATH`
> environment variable or `cvat/datumaro/` should be the current directory.
``` bash
datum --help
python -m datumaro --help
python datumaro/ --help
python datum.py --help
```

``` python
import datumaro
```

## Testing

It is expected that all Datumaro functionality is covered and checked by
unit tests. Tests are placed in `tests/` directory.

To run tests use:

``` bash
python -m unittest discover -s tests
```

If you're working inside CVAT environment, you can also use:

``` bash
python manage.py test datumaro/
```

## Design and code structure

- [Design document](docs/design.md)
- [Developer guide](docs/developer_guide.md)
22 changes: 22 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
MIT License

Copyright (C) 2019-2020 Intel Corporation

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"),
to deal in the Software without restriction, including without limitation
the rights to use, copy, modify, merge, publish, distribute, sublicense,
and/or sell copies of the Software, and to permit persons to whom
the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included
in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES
OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE
OR OTHER DEALINGS IN THE SOFTWARE.

205 changes: 205 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
# Dataset Management Framework (Datumaro)

A framework to build, transform, and analyze datasets.

<!--lint disable fenced-code-flag-->
```
CVAT annotations -- ---> Annotation tool
\ /
COCO-like dataset -----> Datumaro ---> dataset ------> Model training
/ \
VOC-like dataset -- ---> Publication etc.
```
<!--lint enable fenced-code-flag-->

## Contents

- [Documentation](#documentation)
- [Features](#features)
- [Installation](#installation)
- [Usage](#usage)
- [Examples](#examples)
- [Contributing](#contributing)

## Documentation

- [User manual](docs/user_manual.md)
- [Design document](docs/design.md)
- [Contributing](CONTRIBUTING.md)

## Features

- Dataset format conversions:
- COCO (`image_info`, `instances`, `person_keypoints`, `captions`, `labels`*)
- [Format specification](http://cocodataset.org/#format-data)
- [Dataset example](tests/assets/coco_dataset)
- `labels` are our extension - like `instances` with only `category_id`
- PASCAL VOC (`classification`, `detection`, `segmentation` (class, instances), `action_classification`, `person_layout`)
- [Format specification](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/htmldoc/index.html)
- [Dataset example](tests/assets/voc_dataset)
- YOLO (`bboxes`)
- [Format specification](https://github.com/AlexeyAB/darknet#how-to-train-pascal-voc-data)
- [Dataset example](tests/assets/yolo_dataset)
- TF Detection API (`bboxes`, `masks`)
- Format specifications: [bboxes](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/using_your_own_dataset.md), [masks](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/instance_segmentation.md)
- [Dataset example](tests/assets/tf_detection_api_dataset)
- MOT sequences
- [Format specification](https://arxiv.org/pdf/1906.04567.pdf)
- [Dataset example](tests/assets/mot_dataset)
- CVAT
- [Format specification](https://github.com/opencv/cvat/blob/develop/cvat/apps/documentation/xml_format.md)
- [Dataset example](tests/assets/cvat_dataset)
- LabelMe
- [Format specification](http://labelme.csail.mit.edu/Release3.0)
- [Dataset example](tests/assets/labelme_dataset)
- Dataset building operations:
- Merging multiple datasets into one
- Dataset filtering with custom conditions, for instance:
- remove polygons of a certain class
- remove images without a specific class
- remove `occluded` annotations from images
- keep only vertically-oriented images
- remove small area bounding boxes from annotations
- Annotation conversions, for instance:
- polygons to instance masks and vise-versa
- apply a custom colormap for mask annotations
- rename or remove dataset labels
- Dataset comparison
- Model integration:
- Inference (OpenVINO and custom models)
- Explainable AI ([RISE algorithm](https://arxiv.org/abs/1806.07421))

> Check the [design document](docs/design.md) for a full list of features
## Installation

Optionally, create a virtual environment:

``` bash
python -m pip install virtualenv
python -m virtualenv venv
. venv/bin/activate
```

Install Datumaro package:

``` bash
pip install 'git+https://github.com/opencv/cvat#egg=datumaro&subdirectory=datumaro'
```

## Usage

There are several options available:
- [A standalone command-line tool](#standalone-tool)
- [A python module](#python-module)

### Standalone tool

<!--lint disable fenced-code-flag-->
```
User
|
v
+------------------+
| CVAT |
+--------v---------+ +------------------+ +--------------+
| Datumaro module | ----> | Datumaro project | <---> | Datumaro CLI | <--- User
+------------------+ +------------------+ +--------------+
```
<!--lint enable fenced-code-flag-->

``` bash
datum --help
python -m datumaro --help
```

### Python module

Datumaro can be used in custom scripts as a library in the following way:

``` python
from datumaro.components.project import Project # project-related things
import datumaro.components.extractor # annotations and high-level interfaces
# etc.
project = Project.load('directory')
```

## Examples

<!--lint disable list-item-indent-->
<!--lint disable list-item-bullet-indent-->

- Convert [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html#data) to COCO, keep only images with `cat` class presented:
```bash
# Download VOC dataset:
# http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
datum convert --input-format voc --input-path <path/to/voc> \
--output-format coco --filter '/item[annotation/label="cat"]'
```

- Convert only non-occluded annotations from a CVAT-annotated project to TFrecord:
```bash
# export Datumaro dataset in CVAT UI, extract somewhere, go to the project dir
datum project extract --filter '/item/annotation[occluded="False"]' \
--mode items+anno --output-dir not_occluded
datum project export --project not_occluded \
--format tf_detection_api -- --save-images
```

- Annotate COCO, extract image subset, re-annotate it in CVAT, update old dataset:
```bash
# Download COCO dataset http://cocodataset.org/#download
# Put images to coco/images/ and annotations to coco/annotations/
datum project import --format coco --input-path <path/to/coco>
datum project export --filter '/image[images_I_dont_like]' --format cvat \
--output-dir reannotation
# import dataset and images to CVAT, re-annotate
# export Datumaro project, extract to 'reannotation-upd'
datum project project merge reannotation-upd
datum project export --format coco
```

- Annotate instance polygons in CVAT, export as masks in COCO:
```bash
datum convert --input-format cvat --input-path <path/to/cvat.xml> \
--output-format coco -- --segmentation-mode masks
```

- Apply an OpenVINO detection model to some COCO-like dataset,
then compare annotations with ground truth and visualize in TensorBoard:
```bash
datum project import --format coco --input-path <path/to/coco>
# create model results interpretation script
datum model add mymodel openvino \
--weights model.bin --description model.xml \
--interpretation-script parse_results.py
datum model run --model mymodel --output-dir mymodel_inference/
datum project diff mymodel_inference/ --format tensorboard --output-dir diff
```

- Change colors in PASCAL VOC-like `.png` masks:
```bash
datum project import --format voc --input-path <path/to/voc/dataset>

# Create a color map file with desired colors:
#
# label : color_rgb : parts : actions
# cat:0,0,255::
# dog:255,0,0::
#
# Save as mycolormap.txt

datum project export --format voc_segmentation -- --label-map mycolormap.txt
# add "--apply-colormap=0" to save grayscale (indexed) masks
# check "--help" option for more info
# use "datum --loglevel debug" for extra conversion info
```

<!--lint enable list-item-bullet-indent-->
<!--lint enable list-item-indent-->

## Contributing

Feel free to [open an Issue](https://github.com/opencv/cvat/issues/new) if you
think something needs to be changed. You are welcome to participate in development,
development instructions are available in our [developer manual](CONTRIBUTING.md).
8 changes: 8 additions & 0 deletions datum.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/usr/bin/env python
import sys

from datumaro.cli.__main__ import main


if __name__ == '__main__':
sys.exit(main())
4 changes: 4 additions & 0 deletions datumaro/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@

# Copyright (C) 2019-2020 Intel Corporation
#
# SPDX-License-Identifier: MIT
12 changes: 12 additions & 0 deletions datumaro/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@

# Copyright (C) 2019-2020 Intel Corporation
#
# SPDX-License-Identifier: MIT

import sys

from datumaro.cli.__main__ import main


if __name__ == '__main__':
sys.exit(main())
4 changes: 4 additions & 0 deletions datumaro/cli/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@

# Copyright (C) 2019-2020 Intel Corporation
#
# SPDX-License-Identifier: MIT
Loading

0 comments on commit 0bf5d43

Please sign in to comment.