Above: A comparison of the results from consistency-based learning and learning each task individually. The yellow markers highlight the improvement in fine grained details. |
This repository contains tools for training and evaluating models using consistency:
- Pretrained models
- Demo code and an online live demo
- Uncertainty energy estimation code
- Training scripts
- Docker and installation instructions
for the following paper:
Robust Learing Through Cross-Task Consistency (CVPR 2020, Best Paper Award Nomination, Oral)
For further details, a live demo, video visualizations, and an overview talk, refer to our project website.
LIVE DEMO | VIDEO VISUALIZATION |
---|---|
Upload your own images and see the results of different consistency-based models vs. various baselines. |
Visualize models with and without consistency, evaluated on a (non-cherry picked) YouTube video. |
- Introduction
- Installation
- Quickstart (demo code)
- Energy computation
- Download all pretrained models
- Train a consistency model
- Citing
Visual perception entails solving a wide set of tasks (e.g. object detection, depth estimation, etc). The predictions made for each task out of a particular observation are not independent, and therefore, are expected to be consistent.
What is consistency? Suppose an object detector detects a ball in a particular region of an image, while a depth estimator returns a flat surface for the same region. This presents an issue -- at least one of them has to be wrong, because they are inconsistent.
Why is it important?
- Desired learning tasks are usually predictions of different aspects of a single underlying reality (the scene that underlies an image). Inconsistency among predictions implies contradiction.
- Consistency constraints are informative and can be used to better fit the data or lower the sample complexity. They may also reduce the tendency of neural networks to learn "surface statistics" (superficial cues) by enforcing constraints rooted in different physical or geometric rules. This is empirically supported by the improved generalization of models when trained with consistency constraints.
How do we enforce it? The underlying concept is that of path independence in a network of tasks. Given an endpoint Y2
, the path from
X->Y1->Y2
should give the same results as X->Y2
. This can be generalized to a larger system, with paths of arbitrary lengths. In this case, the nodes of the graph are our prediction domains (eg. depth, normal) and the edges are neural networks mapping these domains.
This repository includes training code for enforcing cross task consistency, demo code for visualizing the results of a consistency trained model on a given image and links to download these models. For further details, refer to our paper or website.
Consistency constraints can be used for virtually any set of domains. This repository considers transferring between image domains, and our networks were trained for transferring between the following domains from the Taskonomy dataset.
Curvature Edge-3D Reshading
Depth-ZBuffer Keypoint-2D RGB
Edge-2D Keypoint-3D Surface-Normal
The repo contains consistency-trained models for RGB -> Surface-Normals
, RGB -> Depth-ZBuffer
, and RGB -> Reshading
. In each case the remaining 7 domains are used as consistency constraints in during training.
Descriptions for each domain can be found in the supplementary file of Taskonomy.
All networks are based on the UNet architecture. They take in an input size of 256x256, upsampling is done via bilinear interpolations instead of deconvolutions and trained with the L1 loss. See the table below for more information.
Task Name | Output Dimension | Downsample Blocks |
---|---|---|
RGB -> Depth-ZBuffer |
256x256x1 | 6 |
RGB -> Reshading |
256x256x1 | 5 |
RGB -> Surface-Normal |
256x256x3 | 6 |
Other networks (e.g. Curvature -> Surface-Normal
) use a UNet, their architecture hyperparameters are detailed in transfers.py.
More information on the models, including download links, can be found here and in the supplementary material.
There are two convenient ways to run the code. Either using Docker (recommended) or using a Python-specific tool such as pip, conda, or virtualenv.
We provide a docker that contains the code and all the necessary libraries. It's simple to install and run.
- Simply run:
docker run --runtime=nvidia -ti --rm epflvilab/xtconsistency:latest
The code is now available in the docker under your home directory (/app
), and all the necessary libraries should already be installed in the docker.
The code can also be run using a Python environment manager such as Conda. See requirements.txt for complete list of packages. We recommend doing a clean installation of requirements using virtualenv:
- Clone the repo:
git clone git@github.com:EPFL-VILAB/XTConsistency.git
cd XTConsistency
- Create a new environment and install the libraries:
conda create -n testenv -y python=3.6
source activate testenv
pip install -r requirements.txt
If you haven't yet, then download the pretrained models. Models used for the demo can be downloaded with the following command:
sh ./tools/download_models.sh
This downloads the baseline
, consistency
trained models for depth
, normal
and reshading
target (1.3GB) to a folder called ./models/
. Individial models can be downloaded here.
To run the trained model of a task on a specific image:
python demo.py --task $TASK --img_path $PATH_TO_IMAGE_OR_FOLDER --output_path $PATH_TO_SAVE_OUTPUT
The --task
flag specifies the target task for the input image, which should be either normal
, depth
or reshading
.
To run the script for a normal
target on the example image:
python demo.py --task normal --img_path assets/test.png --output_path assets/
It returns the output prediction from the baseline (test_normal_baseline.png
) and consistency models (test_normal_consistency.png
).
Test image | Baseline | Consistency |
---|---|---|
Similarly, running for target tasks reshading
and depth
gives the following.
Baseline (reshading) | Consistency (reshading) | Baseline (depth) | Consistency (depth) |
---|---|---|---|
Training with consistency involves several paths that each predict the target domain, but using different cues to do so. The disagreement between these predictions yields an unsupervised quantity, consistency energy, that our CVPR 2020 paper found correlates with prediciton error. You can view the pixel-wise consistency energy (example below) using our live demo.
To compute energy locally, over many images, and/or to plot energy vs error, you can use the following energy_calc.py
script. For example, to reproduce the following scatterplot using energy_calc.py
:
Energy vs. Error |
---|
Result from running the command below. |
First download a subset of images from the Taskonomy buildings almena
and albertville
(512 images per domain, 388MB):
sh ./tools/download_data.sh
Second, download all the networks necessary to compute the consistency energy. The following script will download them for you (skipping previously downloaded models) (0.8GB - 4.0GB):
sh ./tools/download_energy_graph_edges.sh
Now we are ready to compute energy. The following command generates a scatter plot of consistency energy vs. prediction error:
python -m scripts.energy_calc energy_calc --batch_size 2 --subset_size=128 --save_dir=results
By default, it computes the energy and error of the subset_size
number of points on the Taskonomy buildings almena
and albertville
. The error is computed for the normal
target. The resulting plot is saved to energy.pdf
in RESULTS_DIR
and the corresponding data to data.csv
.
Consistency energy is an unsupervised quantity and as such, no ground-truth labels are necessary. To compute the energy for all query images in a directory, run:
python -m scripts.energy_calc energy_calc_nogt
--data-dir=PATH_TO_QUERY_IMAGE --batch_size 1 --save_dir=RESULTS_DIR \
--subset_size=NUMBER_OF_IMAGES --cont=PATH_TO_TRAINED_MODEL
It will append a dashed horizontal line to the plot above where the energy of the query image(s) are. This plot is saved to energy.pdf
in RESULTS_DIR
.
We are providing all of our pretrained models for download. These models are the same ones used in the live demo and video evaluations.
All networks are based on the UNet architecture. They take in an input size of 256x256, upsampling is done via bilinear interpolations instead of deconvolutions. All models were trained with the L1 loss.
Instructions for downloading the trained consistency models can be found here
sh ./tools/download_models.sh
This downloads the baseline
, consistency
trained models for depth
, normal
and reshading
target (1.3GB) to a folder called ./models/
. See the table below for specifics:
Task Name | Output Dimension | Downsample Blocks |
---|---|---|
RGB -> Depth-ZBuffer |
256x256x1 | 6 |
RGB -> Reshading |
256x256x1 | 5 |
RGB -> Surface-Normal |
256x256x3 | 6 |
Individual consistency models can be downloaded here.
The pretrained perceptual models can be downloaded with the following command.
sh ./tools/download_percep_models.sh
This downloads the perceptual models for the depth
, normal
and reshading
target (1.6GB). Each target has 7 pretrained models (from the other sources below).
Curvature Edge-3D Reshading
Depth-ZBuffer Keypoint-2D RGB
Edge-2D Keypoint-3D Surface-Normal
Perceptual model architectural hyperparameters are detailed in transfers.py, and some of the pretrained models were trained using L2 loss. For using these models with the provided training code, the pretrained models should be placed in the file path defined by MODELS_DIR
in utils.py.
Individual perceptual models can be downloaded here.
We also provide the models for other baselines used in the paper. Many of these baselines appear in the live demo. The pretrained baselines can be downloaded here. Note that we will not be providing support for them.
- A full list of baselines is in the table below:
Baseline Method Description Tasks (RGB -> X) Baseline UNet [PDF] UNets trained on the Taskonomy dataset. Normal, Reshade, Depth Baseline Perceptual Loss Trained using a randomly initialized percepual network, similar to RND. Normal Cycle Consistency [PDF] A CycleGAN trained on the Taskonomy dataset. Normal GeoNet [PDF] Trained on the Taskonomy dataset and using L1 instead of L2 loss. Normal, Depth Multi-Task [PDF] A multi-task model we trained using UNets, using a shared encoder (similar to here) All Pix2Pix [PDF] A Pix2Pix model trained on the Taskonomy dataset. Normal Taskonomy [PDF] Pretrained models (converted to pytorch here), originally trained here. Normal, Reshading, Depth*
*Models for other tasks are available using the visualpriors
package or in Tensorflow via the Taskonomy GitHub page.
We used the provided training code to train our consistency models on the Taskonomy dataset. We used 3 V100 (32GB) GPUs to train our models, running them for 500 epochs takes about a week.
Runnable Example: You'll find that the code in the rest of this section expects about 12TB of data (9 single-image tasks from Taskonomy). For a quick runnable example that gives the gist, try the following:
First download the data and then start a visdom (logging) server:
sh ./tools/download_data.sh # Starter data (388MB) visdom & # To view the telemetryThen, start the training using the following command, which cascades two models (trains a
normal
model usingcurvature
consistenct on a training set of 512 images).python -m train example_cascade_two_networks --k 1 --fastYou can add more pereceptual losses by changing the config in
energy.py
. For example, train the above model using bothcurvature
and2D edge
consistency:python -m train example_normal --k 2 --fast
Assuming that you want to train on the full dataset or [on your own dataset], read on.
config/ # Configuration parameters: where to save results, etc.
split.txt # Train, val split
jobinfo.txt # Defines job name, base_dir
modules/ # Network definitions
train.py # Training script
dataset.py # Creates dataloader
energy.py # Defines path config, computes total loss, logging
models.py # Implements forward backward pass
graph.py # Computes path defined in energy.py
task_configs.py # Defines task specific preprocessing, masks, loss fn
transfers.py # Loads models
utils.py # Defines file paths (described below)
demo.py # Demo script
The code expects folders structured as follows. These can be modified by changing values in utils.py
base_dir/ # The following paths are defined in utils.py (BASE_DIR)
shared/ # with the corresponding variable names in brackets
models/ # Pretrained models (MODELS_DIR)
results_[jobname]/ # Checkpoint of model being trained (RESULTS_DIR)
ood_standard_set/ # OOD data for visualization (OOD_DIR)
data_dir/ # taskonomy data (DATA_DIRS)
-
Define locations for data, models, etc.: Create a
jobinfo.txt
file and define the name of the job and the absolute path toBASE_DIR
where data, models results would be stored, as shown in the folder structure above. An example config is provided in the starter code (configs/jobinfo.txt
). To modify individual file paths eg. the models folder, changeMODELS_DIR
variable name in utils.py.We won't cover downloading the Taskonomy dataset, which can be downloaded following the instructions here
-
Download perceptual networks: If you want to initialize from our pretrained models, then then download them with the following command (1.6GB):
sh ./tools/download_percep_models.sh
More info about the networks is available here.
-
Train with consistency using the command:
python -m train multiperceptual_{depth,normal,reshading}
For example, to run the training code for the
normal
target, runpython -m train multiperceptual_normal
This trains the model for the
normal
target with 8 perceptual losses ie.curvature
,edge2d
,edge3d
,keypoint2d
,keypoint3d
,reshading
,depth
andimagenet
. We used 3 V100 (32GB) GPUs to train our models, running them for 500 epochs takes about a week.Additional arugments can be specified during training, the most commonly used ones are listed below. For the full list, refer to the training script.
- The flag
--k
defines the number of perceptual losses used, thus reducing GPU memory requirements. - There are several options for choosing how this subset is chosen 1. randomly (
--random-select
) 2. winrate (--winrate
) - Data augmentation is not done by default, it can be added to the training data with the flag
--dataaug
. The transformations applied are 1. random crop with probability 0.5 2. color jitter with probability 0.5.
To train a
normal
target domain with 2 perceptual losses selected randomly each epoch, run the following command.python -m train multiperceptual_normal --k 2 --random-select
- The flag
-
Logging: The losses and visualizations are logged in Visdom. This can be accessed via
[server name]/env/[job name]
eg.localhost:8888/env/normaltarget_allperceps
.An example visualization is shown below. We plot the the outputs from the paths defined in the energy configuration used. Two windows are shown, one shows the predictions before training starts, the other updates them after each epoch. The labels for each column can be found at the top of the window. The second column has the target's ground truth
y^
, the third its predictionn(x)
from the RGB imagex
. Thereafter, the predictions of each pair of images with the same domain are given by the pathsf(y^),f(n(x))
, wheref
is from the target domain to another domain eg.curvature
.Logging conventions: For uninteresting historical reasons, the columns in the logging during training might have strange names. You can define your own names instead of using these by changing the config file in
energy.py
.Here's a quick guide to the current convention. For example, when training with a
normal
model using consistency:- The RGB input is denoted as
x
and thetarget
domain is denoted asy
. The ground truth label for a domain is marked with a^
(e.g.y^
for the fortarget
domain). - The direct (
RGB -> Z
) and perceptual (target [Y] -> Z
) transfer functions are named as follows:
(i.e. the function forrgb
tocurvature
isRC
; fornormal
tocurvature
it'sf
)
Domain (Z) rgb -> Z
(Direct)Y -> Z
(Perceptual)Domain (Z) rgb -> Z
(Direct)Y -> Z
(Perceptual)target n - keypoints2d k2 Nk2 curvature RC f keypoints3d k3 Nk3 sobel edges a s edge occlusion E0 nE0 - The RGB input is denoted as
-
A new configuration should be defined in the
energy_configs
dictionary in energy.py.Decription of the infomation needed:
paths
:X1->X2->X3
. The keys in this dictionary uses a function notation eg.f(n(x))
, with its corresponding value being a list of task objects that defines the domains being transfered eg.[rgb, normal, curvature]
. Thergb
input is defined asx
,n(x)
returnsnormal
predictions fromrgb
, andf(n(x))
returnscurvature
fromnormal
. These notations do not need to be same for all configurations. The table below lists those that have been kept constant for all targets.freeze_list
: the models that will not be optimized,losses
: loss terms to be constructed from the paths defined above,plots
: the paths to plots in the visdom environment.
-
New models may need to be defined in the
pretrained_transfers
dictionary in transfers.py. For example, for acurvature
target, and perceptual modelcurvature
tonormal
, the code will look for theprincipal_curvature2normal.pth
file inMODELS_DIR
if it is not defined in transfers.py.
The expected folder structure for the data is,
DATA_DIRS/
[building]_[domain]/
[domain]/
[view]_domain_[domain].png
...
Pytorch's dataloader __getitem__ method has been overwritten to return a tuple of all tasks for a given building and view point. This is done in datasets.py. Thus, for other folder structures, a function to get the corresponding file paths for different domains should be defined.
For task specific configs, like transformations and masks, are defined in task_configs.py.
If you find the code, models, or data useful, please cite this paper:
@article{zamir2020consistency,
title={Robust Learning Through Cross-Task Consistency},
author={Zamir, Amir and Sax, Alexander and Yeo, Teresa and Kar, Oğuzhan and Cheerla, Nikhil and Suri, Rohan and Cao, Zhangjie and Malik, Jitendra and Guibas, Leonidas},
journal={arXiv},
year={2020}
}