This repo contains the implementation of Full-Glow: Fully conditional Glow for more realistic image generation: https://arxiv.org/abs/2012.05846. A short presentation of the work could be seen here.
Full-Glow extends on previous Glow-based models for conditional image generation by applying conditioning to all Glow operations using appropriate conditioning networks. It was applied to the Cityscapes dataset (label → photo) for synthesizing street-scene images.
Full-Glow was evaluated quantitatively against previous Glow-based models (C-Glow and DUAL-Glow) along with the GAN-based model pix2pix using the PSPNet classifier. With each trained model, we did inference on the Cityscapes validation set 3 times and calculated the PSP scores.
Model | Conditional BPD ↓ | Mean pixel acc. ↑ | Mean class acc. ↑ | Mean class IoU ↑ |
---|---|---|---|---|
C-Glow v.1 | 2.568 | 35.02 ± 0.56 | 12.15 ± 0.05 | 7.33 ± 0.09 |
C-Glow v.2 | 2.363 | 52.33 ± 0.46 | 17.37 ± 0.21 | 12.31 ± 0.24 |
Dual-Glow | 2.585 | 71.44 ± 0.03 | 23.91 ± 0.19 | 18.96 ± 0.17 |
pix2pix | --- | 60.56 ± 0.11 | 22.64 ± 0.21 | 16.42 ± 0.06 |
Full-Glow | 2.345 | 73.50 ± 0.13 | 29.13 ± 0.39 | 23.86 ± 0.30 |
Ground-truth | --- | 95.97 | 84.31 | 77.30 |
Condition Synthesized
Images from left to right: Desired content - Desired structure - Content applied to structure - Ground-truth for structure
Samples generated on the maps dataset
Top row: Condition, Bottom row: synthesized
Top row: Condition, Bottom row: synthesized
To train a model on e.g. Cityscapes, one can run:
python3 main.py --model improved_so_large_longer --img_size 512 1024 --dataset cityscapes --direction label2photo --n_block 4 --n_flow 8 8 8 8 --do_lu --reg_factor 0.0001 --grad_checkpoint
--model
indicates the name of the model (should have 'improved' in the name to enable training Full-Glow)--dataset
determines which dataset to choose. Dataloaders for the Cityscapes, MNIST, and maps datasets are already implemented here--do_lu
enables the use of LU decomposition which has a noticeable effect on training time--reg_factor
indicates the regularizer applied to the right-hand side of the objective function--grad_checkpoint
enables use of gradient checkpointing which is needed here for training on larger images
data_handler
contains implementation of data loaders for different datasetsevaluation
contains code for evaluating the modelsexperiments
has code for experiments such as content transfer and samplinghelper
contains implementation of helper functions for dealing with files, directories, saving/loading checkpoints etc.models
contains implementation of Full-Glow, DUAL-Glow, and C-Glowtrainer
has implementation of the training loop and loss function
Checkpoints for all the Cityscapes models trained in this project (including C-Glow and DUAL-Glow) can be found here: https://kth.box.com/s/h3r9jt5pq8itrnkp0t2qy11pui7u6dmc
- My implementation of the baseline Glow borrows heavily from Kim Seonghyeon's helpful implementation: https://github.com/rosinality/glow-pytorch
If you use our code or build on our method, please cite our paper:
@inproceedings{sorkhei2021full,
author={Sorkhei, Moein and Henter, Gustav Eje and Kjellstr{\"o}m, Hedvig},
title={Full-{G}low: {F}ully conditional {G}low for more realistic image generation},
booktitle={Proceedings of the DAGM German Conference on Pattern Recognition (GCPR)},
volume={43},
month={Oct.},
year={2021}
}