Skip to content

Commit

Permalink
Add level-up documentation - level 10 data generation
Browse files Browse the repository at this point in the history
  • Loading branch information
bonhunko committed Apr 19, 2023
1 parent ef6ab36 commit 949abb8
Show file tree
Hide file tree
Showing 2 changed files with 43 additions and 28 deletions.

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
===========================
Level 10: Data Generation
===========================


Pre-training of deep learning models for vision tasks can increase model accuracy.
Training model with the synthetic dataset is one of famouse pre-training approach
since the manual annotations is quite expensive work.

Base on the `previous research <https://arxiv.org/abs/2103.13023>`_,
Datumaro provides a fractal image dataset (FractalDB) generator that can be utilized to pre-train the vision models.
Learning visual features of FractalDB is known to increase the performance of Vision Transformer (ViT) models.
Note that a fractal patterns in FractalDB is calculated mathmatically using the interated function system (IFS) with random parameters.
We thus not need to concern about any privacy issues.


.. tab-set::

.. tab-item:: CLI

We can generate the synthetic images by the following CLI command:

.. code-block:: bash
datum generate -o <path/to/data> --count GEN_IMG_COUNT --shape GEN_IMG_SHAPE
``GEN_IMG_COUNT`` is an integer that indicates the number of images to be generated. (e.g. `--count 300`)
``GEN_IMG_SHAPE`` is the shape (width height) of generated images (e.g. `--shape 240 180`)

.. tab-item:: Python

With Pthon API, we can generate the synthetic images as below.

.. code-block:: python
from datumaro.plugins.synthetic_data import FractalImageGenerator
FractalImageGenerator(output_dir=<path/to/data>, count=GEN_IMG_COUNT, shape=GEN_IMG_SHAPE).generate_dataset()
``GEN_IMG_COUNT`` is an integer that indicates the number of images to be generated. (e.g. `count=300`)
``GEN_IMG_SHAPE`` is a tuple representing the shape of generated images as (width, height) (e.g. `shape=(240, 180))

Congratulations! You complete reading all Datumaro level-up documents for the intermediate skills.

0 comments on commit 949abb8

Please sign in to comment.