diff --git a/docs/source/docs/level-up/basic_skills/03_dataset_import_export.md b/docs/source/docs/level-up/basic_skills/03_dataset_import_export.md deleted file mode 100644 index 5ce2609331..0000000000 --- a/docs/source/docs/level-up/basic_skills/03_dataset_import_export.md +++ /dev/null @@ -1,28 +0,0 @@ -# Data Import and Export - -Datumaro aims to refine data - -``` bash -datum create -o -datum import -p -f image_dir -``` - -or, if you work with Datumaro API: - -- for using with a project: - - ```python - from datumaro.project import Project - - project = Project.init() - project.import_source('source1', format='image_dir', url='directory/path/') - dataset = project.working_tree.make_dataset() - ``` - -- for using as a dataset: - - ```python - from datumaro import Dataset - - dataset = Dataset.import_from('directory/path/', 'image_dir') - ``` diff --git a/docs/source/docs/level-up/basic_skills/03_dataset_import_export.rst b/docs/source/docs/level-up/basic_skills/03_dataset_import_export.rst new file mode 100644 index 0000000000..c4e67b15dc --- /dev/null +++ b/docs/source/docs/level-up/basic_skills/03_dataset_import_export.rst @@ -0,0 +1,90 @@ +============= +Level 3: Data Import and Export +============= + +Datumaro is a tool that supports public data formats across a wide range of tasks such as +classification, detection, segmentation, pose estimation, or visual tracking. +To facilitate this, Datumaro provides assistance with data import and export via both Python API and CLI. +This makes it easier for users to work with various data formats using Datumaro. + +Prepare dataset +============ + +For the segmentation task, we here introduce the Cityscapes, which collects road scenes from 50 +different cities and contains 5K fine-grained pixel-level annotations and 20K coarse annotations. +More detailed description is given by :ref:`here `. +The Cityscapes dataset is available for free `download `_. + +Convert data format +============ + +Users sometimes needs to compare, merge, or manage various kinds of public datasets in a unified +system. To achieve this, Datumaro not only has `import` and `export` funcionalities, but also +provides `convert`, which shortens the import and export into a single command line. +We now convert the Cityscapes data into the MS-COCO format, which is described in :ref:`here `. + + +.. tabbed:: CLI + + Without creation of a project, we can achieve this with a single line command `convert` in Datumaro + + .. code-block:: bash + + datum convert -if cityscapes -i -f coco_panoptic -o + +.. tabbed:: Python + + With Pthon API, we can import the data through `Dataset` as below. + + .. code-block:: python + + from datumaro.components.dataset import Dataset + + data_path = '/path/to/cityscapes' + data_format = 'cityscapes' + + dataset = Dataset.import_from(data_path, data_format) + + We then export the import dataset as + + .. code-block:: python + + output_path = '/path/to/output' + + dataset.export(output_path, format='coco_panoptic') + +.. tabbed:: ProjectCLI + + With the project-based CLI, we first require to create a project by + + .. code-block:: bash + + datum create -o + + We now import Cityscapes data into the project through + + .. code-block:: bash + + datum import --format cityscapes -p + + (Optional) When we import a data, the change is automatically commited in the project. + This can be shown through `log` as + + .. code-block:: bash + + datum log -p + + (Optional) We can check the imported dataset information such as subsets, number of data, or + categories through `info`. + + .. code-block:: bash + + datum info -p + + Finally, we export the data within the project with MS-COCO format as + + .. code-block:: bash + + datum export --format coco -p -o -- --save-media + +For a data with an unknown format, we can detect the format in the :ref:`next level `! diff --git a/docs/source/docs/level-up/basic_skills/04_detect_data_format.md b/docs/source/docs/level-up/basic_skills/04_detect_data_format.md deleted file mode 100644 index b6bfb43b17..0000000000 --- a/docs/source/docs/level-up/basic_skills/04_detect_data_format.md +++ /dev/null @@ -1,28 +0,0 @@ -# Detect Data Format from an Unknown Dataset - -Datumaro aims to refine data - -``` bash -datum create -o -datum import -p -f image_dir -``` - -or, if you work with Datumaro API: - -- for using with a project: - - ```python - from datumaro.project import Project - - project = Project.init() - project.import_source('source1', format='image_dir', url='directory/path/') - dataset = project.working_tree.make_dataset() - ``` - -- for using as a dataset: - - ```python - from datumaro import Dataset - - dataset = Dataset.import_from('directory/path/', 'image_dir') - ``` diff --git a/docs/source/docs/level-up/basic_skills/04_detect_data_format.rst b/docs/source/docs/level-up/basic_skills/04_detect_data_format.rst new file mode 100644 index 0000000000..5a30683f0f --- /dev/null +++ b/docs/source/docs/level-up/basic_skills/04_detect_data_format.rst @@ -0,0 +1,41 @@ +============= +Level 4: Detect Data Format from an Unknown Dataset +============= + +Datumaro provides a function to detect the format of a dataset before importing data. This can be +useful in cases where information about the original format of the data has been lost or is unclear. +With this function, users can easily identify the format and proceed with appropriate data +handling processes. + +Detect data format +============ + +.. tabbed:: CLI + + .. code-block:: bash + + datum detect-format + + The printed format can be utilized as `format` argument when importing a dataset as following the + :ref:`previous level `. + +.. tabbed:: Python + + .. code-block:: python + + from datumaro.components.environment import Environment + + data_path = '/path/to/data' + + env = Environment() + + detected_formats = env.detect_dataset(data_path) + + + (Optional) With the detected format, we can import the dataset as below. + + .. code-block:: python + + from datumaro.components.dataset import Dataset + + dataset = Dataset.import_from(data_path, detected_formats[0]) diff --git a/docs/source/docs/level-up/basic_skills/index.rst b/docs/source/docs/level-up/basic_skills/index.rst index 794a8ea553..44a45c6a18 100644 --- a/docs/source/docs/level-up/basic_skills/index.rst +++ b/docs/source/docs/level-up/basic_skills/index.rst @@ -26,6 +26,7 @@ Basic Skills :text: Level 3: Dataset Import & Export :classes: btn-outline-primary btn-block + :badge:`ProjectCLI,badge-primary` :badge:`CLI,badge-info` :badge:`Python,badge-warning`