Is synthetic data from generative models ready for image recognition? (ICLR 2023, Spotlight)
By
Ruifei He,
Shuyang Sun,
Xin Yu,
Chuhui Xue,
Wenqing Zhang,
Philip Torr,
Song Bai,
Xiaojuan Qi.
Recent text-to-image generation models have shown promising results in generating high-fidelity photo-realistic images. Though the results are astonishing to human eyes, how applicable these generated images are for recognition tasks remains under-explored. In this work, we extensively study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks, and focus on two perspectives: synthetic data for improving classification models in data-scarce settings ({\ie} zero-shot and few-shot), and synthetic data for large-scale model pre-training for transfer learning. We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks.
-
Clone our repo:
git clone https://github.com/CVMI-Lab/SyntheticData.git
-
Install dependencies:
conda create -n SyntheticData python=3.7 conda activate SyntheticData pip install -r requirements.txt
We generate sentences from label names of a specific dataset and save the generated sentences offline.
Input the targeted label space in variable labels
in file src/LE.py
and run it like:
python3.7 src/LE.py 200 /path/to/save/dataset.pkl
where 200 is the number of sentence for each label, and the latter is the save path for the generated sentences.
We use GLIDE for text-to-image generation, and follow the official instructions for the generation process.
We use text generated from language enhancement as prompts for the text-to-image generation.
We provide a multi-gpu generation code example in src/glide/glide_zsl.py
and run it like:
sh glide/gen_zsl.sh /path/to/save/dataset.pkl /path/to/save/dataset
We use CLIP to help filter out unreliable images:
# under dir: classifier-tuning
python3.7 src/select_glide_ims_by_clip.py /path/to/synthetic/dataset 10 # 10 is the number of class for a given task
We revise from the Wise-ft codebase. Here, we provide a example for the Eurosat dataset.
"model" could choose "RN50"/"ViT-B/16".
Note that you should download the validation/test data for each dataset and revise the path in src/classifier-tuning/src/dataset/transfer_datasets.py
.
python3.7 src/ct_zsl.py \
--freeze-encoder \
--sl=0.5 \
--sl_T=2 \
--train-dataset=Eurosat \
--save=/path/to/save/results \
--epochs=30 \
--lr=2e-3 \
--wd=0.1 \
--batch-size=512 \
--warmup_length=0 \
--cache-dir=cache \
--model=RN50 \
--eval-datasets=Eurosat \
--template=eurosat_template \
--results-db=results.jsonl \
--data-location=/path/to/synthetic/data | tee results/${exp_name}/train-$now.log
We provide the code for our proposed Real Guidance strategy. We would first obtain a set of few-shot images for a given task. You may need to revise the function get_few_shot_images_path_prompt_pairs()
that returns a list of (im_path, prompt) in file src/glide/glide_fsl.py
.
Also, you should set the variable refer_img_iters
to 15, 20, 35, 40, and 50 for shot 16, 8, 4, 2, and 1, respectively, and make the result of batch_size * batch_size_time * shot =800
.
We provide a multi-gpu generation code example in src/glide/glide_fsl.py
and run it like:
sh glide/gen_fsl.sh /path/to/few-shot/images /path/to/save/dataset
Again, we revise from the Wise-ft codebase. Following is a example:
python3.7 src/ct_fsl.py \
--freeze-encoder \
--sl=0.5 \
--sl_T=2 \
--train-dataset=Eurosat \
--save=/path/to/save/results \
--epochs=30 \
--lr=1e-3 \
--wd=0.1 \
--batch-size-real=32 \
--batch-size-syn=512 \
--loss-weight=1.0 \
--loss-weight-real=1.0 \
--warmup_length=0 \
--cache-dir=cache \
--model=RN50 \
--eval-datasets=Eurosat \
--template=eurosat_template \
--results-db=results.jsonl \
--data-location=/path/to/synthetic/data \
--data-location-real=/path/to/few-shot/data | tee results/${exp_name}/train-$now.log
We adopt language enhancement strategy only for pre-training setting. Please modify the files (src/LE.py
, src/glide/glide_zsl.py
) in zero-shot settings for generating synthetic pre-training data.
We recommend using timm codebase for its wonderful implementation for pre-training. For concrete hyper-parameters, please refer to Sec. C.5.3 in our Appendix.
If you find this repo useful for your research, please consider citing our paper:
@article{he2022synthetic,
title={Is synthetic data from generative models ready for image recognition?},
author={He, Ruifei and Sun, Shuyang and Yu, Xin and Xue, Chuhui and Zhang, Wenqing and Torr, Philip and Bai, Song and Qi, Xiaojuan},
journal={arXiv preprint arXiv:2210.07574},
year={2022}
}
We thank the open source code from GLIDE, CLIP, keytotext, Wise-ft, timm, Detectron2, DeiT, MoCo.