Code release for "Greedy Gradient Ensemble for Robust Visual Question Answering" (ICCV 2021, Oral). GGE can extend to other tasks with dataset biases.
@inproceedings{han2015greedy,
title={Greedy Gradient Ensemble for Robust Visual Question Answering},
author={Han, Xinzhe and Wang, Shuhui and Su, Chi and Huang, Qingming and Tian, Qi},
booktitle={Proceedings of the IEEE international conference on computer vision},
year={2021}
}
We improve the generalization of GGE and present General Greedy De-bias (GGD).
@article{han2021general,
title={General Greedy De-bias Learning},
author={Han, Xinzhe and Wang, Shuhui and Su, Chi and Huang, Qingming and Tian, Qi},
journal={arXiv preprint arXiv:2112.10572 },
year={2021}
}
We use Anaconda to manage our dependencies . You will need to execute the following steps to install all dependencies:
-
Edit the value for
prefix
variable inrequirements.yml
file, by assigning it the path to conda environment -
Then, install all dependencies using:
conda env create -f requirements.yml
-
Change to the new environment:
bias
- Download UpDn features from google drive into
/data/detection_features
folder - Download questions/answers for VQAv2 and VQA-CPv2 by executing
bash tools/download.sh
- Download visual cues/hints provided in A negative case analysis of visual grounding methods for VQA into
data/hints
. Note that we use caption based hints for grounding-based method reproduction, CGR and CGW. - Preprocess process the data with
bash tools/process.sh
Run
CUDA_VISIBLE_DEVICES=0 python main.py --dataset cpv2 --mode MODE --debias gradient --topq 1 --topv -1 --qvp 5 --output []
to train a model. In main.py
, import base_model
for UpDn baseline; import base_model_ban as base_model
for BAN baseline; import base_model_block as base_model
for S-MRL baseline.
Set MODE
as gge_iter
and gge_tog
for our best performance model; gge_d_bias
and gge_q_bias
for single bias ablation; base
for baseline model.
For models in Sec. 3, execute from train_ab import train
and import base_model_ab as base_model
in main.py
. Run
CUDA_VISIBLE_DEVICES=0 python main.py --dataset cpv2 --mode MODE --debias METHODS --topq 1 --topv -1 --qvp 5 --output []
METHODS learned_mixin
for LMH, MODE inv_sup
for inv_sup strategy, v_inverse
for inverse hint. Note that the results for HINT$_inv$ is obtained by running the code from A negative case analysis of visual grounding methods for VQA.
To test v_only model, import base_model_v_only as base_model
in main.py
.
To test RUBi and LMH+RUBi, run
CUDA_VISIBLE_DEVICES=0 python rubi_main.py --dataset cpv2 --mode MODE --output []
MODE updn
is for RUBi, lmh_rubi
is for LMH+RUBi.
For test stage, we output the overall Acc, CGR, CGW and CGD at threshold 0.2.
change base_model to corresponding model in sensitivity.py
and run
CUDA_VISIBLE_DEVICES=0 python sensitivity.py --dataset cpv2 --debias METHOD --load_checkpoint_path logs/your_path --output your_path
We provide visualization in visualization.ipynb
. If you want to see other visualization by yourself, download MS-COCO 2014 to data/images
.
Sorry for the wrong derivation of the negative gradient for Sigmoid+BCE loss. The correct negative gradient is
In theory, as long as the pseudo label has a negative correlation with the bias model prediction, it is able to mine the hard examples.
The wrong gradient in the paper is actually an approximation of
This repo uses features from A negative case analysis of visual grounding methods for VQA. Some codes are modified from CSS and UpDn.