https://anonymous.4open.science/r/QAA-32B8/
- Download and unpack Visual Genome images as well as the annotations, class info and image meta-data
- Get initial scene graph with VCT
- Next, run train_graph.py to train the scene graph generation
python train_graph.py --input_scene_dir <path/to/input/scene/dir> --output_scene_dir <path/to/output/scene/dir>
- Download Glove pretrained word vectors
- Preprocess OK-VQA questions to obtain train_questions.pt and vocab.json
python preprocess_questions.py --glove_pt </path/to/generated/glove/pickle/file> --input_questions_json </your/path/to/v2_OpenEnded_mscoco_train2014_questions.json> --input_annotations_json </your/path/to/v2_mscoco_train2014_annotations.json> --output_pt </your/output/path/train_questions.pt> --vocab_json </your/output/path/vocab.json> --mode train
- Download grounded features from paper Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering repo
- Preprocess featurs
python preprocess_features.py --input_tsv_folder /your/path/to/trainval_36/ --output_h5 /your/output/path/trainval_feature.h5
- Train the pretrained model
python train.py --input_dir <path/to/preprocessed/files> --save_dir </path/for/checkpoint> --val
- Query Augmentation
python train.py --input_dir <path/to/preprocessed/files> --save_dir </path/for/checkpoint> --mode aug
- Test
python train.py --input_dir <path/to/preprocessed/files> --save_dir </path/for/checkpoint> --mode test