From 4672276b0cf72ea74f6eecf16b36696567aec992 Mon Sep 17 00:00:00 2001 From: Wonju Lee Date: Tue, 11 Apr 2023 10:13:07 +0900 Subject: [PATCH] [Doc] add tutorials for level 3 and 4 (#920) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ### Summary Please take a look at http://10.225.20.174:7041/build/html/docs/level-up/basic_skills/03_dataset_import_export.html and http://10.225.20.174:7041/build/html/docs/level-up/basic_skills/04_detect_data_format.html#level-4-detect-data-format-from-an-unknown-dataset for references. ### How to test ### Checklist - [ ] I have added unit tests to cover my changes.​ - [ ] I have added integration tests to cover my changes.​ - [ ] I have added the description of my changes into [CHANGELOG](https://github.com/openvinotoolkit/datumaro/blob/develop/CHANGELOG.md).​ - [ ] I have updated the [documentation](https://github.com/openvinotoolkit/datumaro/tree/develop/docs) accordingly ### License - [ ] I submit _my code changes_ under the same [MIT License](https://github.com/openvinotoolkit/datumaro/blob/develop/LICENSE) that covers the project. Feel free to contact the maintainers if that's a concern. - [ ] I have updated the license header for each file (see an example below). ```python # Copyright (C) 2023 Intel Corporation # # SPDX-License-Identifier: MIT ``` --- .../basic_skills/03_dataset_import_export.md | 28 ------ .../basic_skills/03_dataset_import_export.rst | 90 +++++++++++++++++++ .../basic_skills/04_detect_data_format.md | 28 ------ .../basic_skills/04_detect_data_format.rst | 41 +++++++++ .../docs/level-up/basic_skills/index.rst | 1 + 5 files changed, 132 insertions(+), 56 deletions(-) delete mode 100644 docs/source/docs/level-up/basic_skills/03_dataset_import_export.md create mode 100644 docs/source/docs/level-up/basic_skills/03_dataset_import_export.rst delete mode 100644 docs/source/docs/level-up/basic_skills/04_detect_data_format.md create mode 100644 docs/source/docs/level-up/basic_skills/04_detect_data_format.rst diff --git a/docs/source/docs/level-up/basic_skills/03_dataset_import_export.md b/docs/source/docs/level-up/basic_skills/03_dataset_import_export.md deleted file mode 100644 index 5ce2609331..0000000000 --- a/docs/source/docs/level-up/basic_skills/03_dataset_import_export.md +++ /dev/null @@ -1,28 +0,0 @@ -# Data Import and Export - -Datumaro aims to refine data - -``` bash -datum create -o -datum import -p -f image_dir -``` - -or, if you work with Datumaro API: - -- for using with a project: - - ```python - from datumaro.project import Project - - project = Project.init() - project.import_source('source1', format='image_dir', url='directory/path/') - dataset = project.working_tree.make_dataset() - ``` - -- for using as a dataset: - - ```python - from datumaro import Dataset - - dataset = Dataset.import_from('directory/path/', 'image_dir') - ``` diff --git a/docs/source/docs/level-up/basic_skills/03_dataset_import_export.rst b/docs/source/docs/level-up/basic_skills/03_dataset_import_export.rst new file mode 100644 index 0000000000..c4e67b15dc --- /dev/null +++ b/docs/source/docs/level-up/basic_skills/03_dataset_import_export.rst @@ -0,0 +1,90 @@ +============= +Level 3: Data Import and Export +============= + +Datumaro is a tool that supports public data formats across a wide range of tasks such as +classification, detection, segmentation, pose estimation, or visual tracking. +To facilitate this, Datumaro provides assistance with data import and export via both Python API and CLI. +This makes it easier for users to work with various data formats using Datumaro. + +Prepare dataset +============ + +For the segmentation task, we here introduce the Cityscapes, which collects road scenes from 50 +different cities and contains 5K fine-grained pixel-level annotations and 20K coarse annotations. +More detailed description is given by :ref:`here `. +The Cityscapes dataset is available for free `download `_. + +Convert data format +============ + +Users sometimes needs to compare, merge, or manage various kinds of public datasets in a unified +system. To achieve this, Datumaro not only has `import` and `export` funcionalities, but also +provides `convert`, which shortens the import and export into a single command line. +We now convert the Cityscapes data into the MS-COCO format, which is described in :ref:`here `. + + +.. tabbed:: CLI + + Without creation of a project, we can achieve this with a single line command `convert` in Datumaro + + .. code-block:: bash + + datum convert -if cityscapes -i -f coco_panoptic -o + +.. tabbed:: Python + + With Pthon API, we can import the data through `Dataset` as below. + + .. code-block:: python + + from datumaro.components.dataset import Dataset + + data_path = '/path/to/cityscapes' + data_format = 'cityscapes' + + dataset = Dataset.import_from(data_path, data_format) + + We then export the import dataset as + + .. code-block:: python + + output_path = '/path/to/output' + + dataset.export(output_path, format='coco_panoptic') + +.. tabbed:: ProjectCLI + + With the project-based CLI, we first require to create a project by + + .. code-block:: bash + + datum create -o + + We now import Cityscapes data into the project through + + .. code-block:: bash + + datum import --format cityscapes -p + + (Optional) When we import a data, the change is automatically commited in the project. + This can be shown through `log` as + + .. code-block:: bash + + datum log -p + + (Optional) We can check the imported dataset information such as subsets, number of data, or + categories through `info`. + + .. code-block:: bash + + datum info -p + + Finally, we export the data within the project with MS-COCO format as + + .. code-block:: bash + + datum export --format coco -p -o -- --save-media + +For a data with an unknown format, we can detect the format in the :ref:`next level `! diff --git a/docs/source/docs/level-up/basic_skills/04_detect_data_format.md b/docs/source/docs/level-up/basic_skills/04_detect_data_format.md deleted file mode 100644 index b6bfb43b17..0000000000 --- a/docs/source/docs/level-up/basic_skills/04_detect_data_format.md +++ /dev/null @@ -1,28 +0,0 @@ -# Detect Data Format from an Unknown Dataset - -Datumaro aims to refine data - -``` bash -datum create -o -datum import -p -f image_dir -``` - -or, if you work with Datumaro API: - -- for using with a project: - - ```python - from datumaro.project import Project - - project = Project.init() - project.import_source('source1', format='image_dir', url='directory/path/') - dataset = project.working_tree.make_dataset() - ``` - -- for using as a dataset: - - ```python - from datumaro import Dataset - - dataset = Dataset.import_from('directory/path/', 'image_dir') - ``` diff --git a/docs/source/docs/level-up/basic_skills/04_detect_data_format.rst b/docs/source/docs/level-up/basic_skills/04_detect_data_format.rst new file mode 100644 index 0000000000..5a30683f0f --- /dev/null +++ b/docs/source/docs/level-up/basic_skills/04_detect_data_format.rst @@ -0,0 +1,41 @@ +============= +Level 4: Detect Data Format from an Unknown Dataset +============= + +Datumaro provides a function to detect the format of a dataset before importing data. This can be +useful in cases where information about the original format of the data has been lost or is unclear. +With this function, users can easily identify the format and proceed with appropriate data +handling processes. + +Detect data format +============ + +.. tabbed:: CLI + + .. code-block:: bash + + datum detect-format + + The printed format can be utilized as `format` argument when importing a dataset as following the + :ref:`previous level `. + +.. tabbed:: Python + + .. code-block:: python + + from datumaro.components.environment import Environment + + data_path = '/path/to/data' + + env = Environment() + + detected_formats = env.detect_dataset(data_path) + + + (Optional) With the detected format, we can import the dataset as below. + + .. code-block:: python + + from datumaro.components.dataset import Dataset + + dataset = Dataset.import_from(data_path, detected_formats[0]) diff --git a/docs/source/docs/level-up/basic_skills/index.rst b/docs/source/docs/level-up/basic_skills/index.rst index 794a8ea553..44a45c6a18 100644 --- a/docs/source/docs/level-up/basic_skills/index.rst +++ b/docs/source/docs/level-up/basic_skills/index.rst @@ -26,6 +26,7 @@ Basic Skills :text: Level 3: Dataset Import & Export :classes: btn-outline-primary btn-block + :badge:`ProjectCLI,badge-primary` :badge:`CLI,badge-info` :badge:`Python,badge-warning`