conda create -n mmclvqa python=3.8
conda activate mmclvqa
git clone https://github.com/showlab/CLVQA.git
cd CLVQA
cd mmclvqa
pip install --editable .
cd ..
pip install -r extra_requirements.txt
We release the datasets and annotations in json
format(link) and npy
format(link). To use our code for training, please download the npy
files.
- Example of data sample:
{ 'answer': 'kiosk', # answer 'answers': ['kiosk','kiosk',...], # answer in VQAv2 format, repeat 10 times if there is only one answer in the annotation 'feature_path': '440.npy', # feature path to retrieve features 'gqa_question': # GQA annotations, if applicable { 'annotations': { 'answer': {}, 'fullAnswer': {}, 'question': {}}, 'answer': 'kiosk', 'entailed': ['06778810', '06778808'], 'equivalent': ['06778808', '06778809'], 'fullAnswer': 'It is a kiosk.', 'groups': {'global': 'place', 'local': '02q-place'}, 'imageId': '440', 'isBalanced': True, 'question': 'What place is this?', 'semantic': [ { 'argument': 'scene', 'dependencies': [], 'operation': 'select'}, { 'argument': 'place', 'dependencies': [0], 'operation': 'query'}], 'semanticStr': 'select: scene->query: place [0]', 'types': { 'detailed': 'place', 'semantic': 'global', 'structural': 'query'}}, 'gt_scene_graph_mask': [1,0,0,0 ..., ], # Ground-truth SG mask for question answer generation corresponding to `gt_scene_graph_seq`. 1 represents the SG relation is related to the question-answer generation. 'gt_scene_graph_seq': [ # Ground-truth SG annotated for the image in this annotation datum. 'kiosk [SEP]', 'counter [SEP]', 'lady [SEP]', 'trash can [SEP]', ... ], 'image_id': '440', # image id 'image_source': 'vg', # image source 'ocr': [], # ocr info in the image, applicable in textvqa 'ocr_info': [], # ocr info in the image, applicable in textvqa 'ocr_tokens': [], # ocr tokens, applicable in text vqa 'pred_scene_graph_seq': [ # predicted SG extracted by an off-the-shelf model 'building behind man [SEP]', 'building behind woman [SEP]', 'man watching man [SEP]', 'person watching man [SEP]', 'building behind woman [SEP]', ... ], 'program': [ # program excuted to generate question {'argument': 'scene', 'dependencies': [], 'operation': 'select'}, { 'argument': 'place', 'dependencies': [0], 'operation': 'query'} ], 'question': 'What place is this?', # question 'question_id': 'g06778809', # question id 'raw_question_type': { # raw question type, applicable in original GQA annotation 'detailed': 'place', 'semantic': 'global', 'structural': 'query' }, 'set_name': 'train', # set name: train/val 'stage': 'object', # stage name for continual learning 'supporting_fact': [] # supporting facts, applicable in stage "knowledge" }
Implementation for Symbolic Replay Model could be found in SRM/. We provide training scripts for SRM here. Specifically,
cd SRM/
# training SRM under scene-incremental setting, with task order a->b->c->d->e->f, using distilgpt2
CUDA_VISIBLE_DEVICES=0 python train.py \
--cl_setting scene \
--task_seq abcdef \
--model_name distilgpt2 \
--model_dir_root /...path_to/exp/clvqa/QAG_seq/not_use_gt/QAG_scene_task_token \
--add_task_tokens \
--n_train_epochs 15
# training SRM under function-incremental setting, with task order o->a->r->l->k->s, using distilgpt2
CUDA_VISIBLE_DEVICES=0 python train.py \
--cl_setting functional \
--task_seq oarlks \
--model_name distilgpt2 \
--model_dir_root /...path_to/exp/clvqa/QAG_seq/not_use_gt/QAG_functional_task_token \ --add_task_tokens \
--n_train_epochs 15
- We release our replayed samples for 6 task orders as reported in the paper.
- For the 6 tasks orders, you can inspect via these files: scene / function or refer to our paper.
Refer to scripts in this folder for one-stop training-and-testing (generated by generate_run_scripts.py).
Specifically, training with replayed samples from SRM, with #replayed_samples : #current_task_samples =
ROOT=/Users/stan
DEVICE=0
if [ ! -f "$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_attribute/unicl_final.pth" ] ; then
CUDA_VISIBLE_DEVICES=$DEVICE mmf_run config=EXP_CONFIG/functional/cl_attribute_unicl_standalone.yaml \
model=unicl \
dataset=clvqa \
training.CL.use_cl=True \
training.CL.use_callback=False \
training.CL.use_replay=True \
training.CL.replay_method=restore_with_prob \
training.CL.task_order=oarlks \
training.CL.restore_rate=1.5 \
training.CL.restore_dir=$ROOT/exp/clvqa/QAG_seq/not_use_gt/QAG_functional_task_token/distilgpt2_replay/distilgpt2_functional_oarlks \
training.CL.restore_paths=oarlks_REPLAY[o]_AT[a].npy \
dataset_config.clvqa.use_mask_img=True \
dataset_config.clvqa.mask_img_prob=0.15 \
run_type=train_val \
checkpoint.resume_file=$ROOT/exp/clvqa/save/stand_alone/functional/unicl_object/unicl_final.pth \
env.save_dir=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_attribute \
training.checkpoint_interval=4000 \
training.callbacks=[]
fi
if [ ! -f "$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_relation/unicl_final.pth" ] ; then
CUDA_VISIBLE_DEVICES=$DEVICE mmf_run config=EXP_CONFIG/functional/cl_relation_unicl_standalone.yaml \
model=unicl \
dataset=clvqa \
training.CL.use_cl=True \
training.CL.use_callback=False \
training.CL.use_replay=True \
training.CL.replay_method=restore_with_prob \
training.CL.task_order=oarlks \
training.CL.restore_rate=1.5 \
training.CL.restore_dir=$ROOT/exp/clvqa/QAG_seq/not_use_gt/QAG_functional_task_token/distilgpt2_replay/distilgpt2_functional_oarlks \
training.CL.restore_paths=oarlks_REPLAY[o]_AT[r].npy,oarlks_REPLAY[a]_AT[r].npy \
dataset_config.clvqa.use_mask_img=True \
dataset_config.clvqa.mask_img_prob=0.15 \
run_type=train_val \
checkpoint.resume_file=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_attribute/unicl_final.pth \
env.save_dir=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_relation \
training.checkpoint_interval=4000 \
training.callbacks=[]
fi
if [ ! -f "$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_logical/unicl_final.pth" ] ; then
CUDA_VISIBLE_DEVICES=$DEVICE mmf_run config=EXP_CONFIG/functional/cl_logical_unicl_standalone.yaml \
model=unicl \
dataset=clvqa \
training.CL.use_cl=True \
training.CL.use_callback=False \
training.CL.use_replay=True \
training.CL.replay_method=restore_with_prob \
training.CL.task_order=oarlks \
training.CL.restore_rate=1.5 \
training.CL.restore_dir=$ROOT/exp/clvqa/QAG_seq/not_use_gt/QAG_functional_task_token/distilgpt2_replay/distilgpt2_functional_oarlks \
training.CL.restore_paths=oarlks_REPLAY[o]_AT[l].npy,oarlks_REPLAY[a]_AT[l].npy,oarlks_REPLAY[r]_AT[l].npy \
dataset_config.clvqa.use_mask_img=True \
dataset_config.clvqa.mask_img_prob=0.15 \
run_type=train_val \
checkpoint.resume_file=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_relation/unicl_final.pth \
env.save_dir=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_logical \
training.checkpoint_interval=4000 \
training.callbacks=[]
fi
if [ ! -f "$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_knowledge/unicl_final.pth" ] ; then
CUDA_VISIBLE_DEVICES=$DEVICE mmf_run config=EXP_CONFIG/functional/cl_knowledge_unicl_standalone.yaml \
model=unicl \
dataset=clvqa \
training.CL.use_cl=True \
training.CL.use_callback=False \
training.CL.use_replay=True \
training.CL.replay_method=restore_with_prob \
training.CL.task_order=oarlks \
training.CL.restore_rate=1.5 \
training.CL.restore_dir=$ROOT/exp/clvqa/QAG_seq/not_use_gt/QAG_functional_task_token/distilgpt2_replay/distilgpt2_functional_oarlks \
training.CL.restore_paths=oarlks_REPLAY[o]_AT[k].npy,oarlks_REPLAY[a]_AT[k].npy,oarlks_REPLAY[r]_AT[k].npy,oarlks_REPLAY[l]_AT[k].npy \
dataset_config.clvqa.use_mask_img=True \
dataset_config.clvqa.mask_img_prob=0.15 \
run_type=train_val \
checkpoint.resume_file=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_logical/unicl_final.pth \
env.save_dir=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_knowledge \
training.checkpoint_interval=4000 \
training.callbacks=[]
fi
if [ ! -f "$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_scenetext/unicl_final.pth" ] ; then
CUDA_VISIBLE_DEVICES=$DEVICE mmf_run config=EXP_CONFIG/functional/cl_scenetext_unicl_standalone.yaml \
model=unicl \
dataset=clvqa \
training.CL.use_cl=True \
training.CL.use_callback=False \
training.CL.use_replay=True \
training.CL.replay_method=restore_with_prob \
training.CL.task_order=oarlks \
training.CL.restore_rate=1.5 \
training.CL.restore_dir=$ROOT/exp/clvqa/QAG_seq/not_use_gt/QAG_functional_task_token/distilgpt2_replay/distilgpt2_functional_oarlks \
training.CL.restore_paths=oarlks_REPLAY[o]_AT[s].npy,oarlks_REPLAY[a]_AT[s].npy,oarlks_REPLAY[r]_AT[s].npy,oarlks_REPLAY[l]_AT[s].npy,oarlks_REPLAY[k]_AT[s].npy \
dataset_config.clvqa.use_mask_img=True \
dataset_config.clvqa.mask_img_prob=0.15 \
run_type=train_val \
checkpoint.resume_file=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_knowledge/unicl_final.pth \
env.save_dir=$ROOT/exp/clvqa/save/functional/setting_1_oarlks/distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5/unicl_scenetext \
training.checkpoint_interval=4000 \
training.callbacks=[]
fi
- We config different settings and generate scripts in generate_run_scripts.py. Refer to this file for more settings you would like to explore.
- Implementation for Dataset pls refer to dataset.py.
- Implementation for UniVQA pls refer to UniCL.py.
- LAST Checkpoint:Scene-SRM1.5xReplay-abcdef for scene setting, 1.5x SRM replayed samples, task order abcdef.
- LAST Checkpoint:Function-SRM1.5xReplay-oarlks for function setting, 1.5x SRM replayed samples, task order oarlks.
One can follow generate_run_scripts.py to generate one stop training-and-testing. For testing only, please refer to eval_os.py. An testing example for function setting, 1.5x SRM replayed samples, task order oarlks.
python
>>> from eval_os import *
>>> stage_sweep(cl_setting='functional', setting_idx=1, abbr_seq='oarlks', device=0, model_name='unicl', save_dir='/Users/stan/exp/clvqa', val_exp='distilgpt2_replay_qag_seq_not_use_gt_task_token_1.5', test_stand_alone=False, test_reg=False, print_acc=False)
{'textvqa_accuracy': {'a2a': 0.4177,
'a2k': 0.1967,
'a2l': 0.0563,
'a2o': 0.4037,
'a2r': 0.121,
'a2s': 0.1453,
'k2a': 0.3263,
'k2k': 0.6813,
'k2l': 0.6807,
'k2o': 0.295,
'k2r': 0.3167,
'k2s': 0.1501,
'l2a': 0.272,
'l2k': 0.1943,
'l2l': 0.7153,
'l2o': 0.2653,
'l2r': 0.307,
'l2s': 0.1408,
'o2a': 0.1013,
'o2k': 0.1063,
'o2l': 0.0197,
'o2o': 0.5997,
'o2r': 0.0823,
'o2s': 0.0962,
'r2a': 0.3713,
'r2k': 0.2073,
'r2l': 0.121,
'r2o': 0.4023,
'r2r': 0.3943,
'r2s': 0.1555,
's2a': 0.3083,
's2k': 0.6253,
's2l': 0.6733,
's2o': 0.2963,
's2r': 0.3037,
's2s': 0.5511}}
{'textvqa_accuracy': [('o2o', 0.5997),
('o2a', 0.1013),
('o2r', 0.0823),
('o2l', 0.0197),
('o2k', 0.1063),
('o2s', 0.0962),
('a2o', 0.4037),
('a2a', 0.4177),
('a2r', 0.121),
('a2l', 0.0563),
('a2k', 0.1967),
('a2s', 0.1453),
('r2o', 0.4023),
('r2a', 0.3713),
('r2r', 0.3943),
('r2l', 0.121),
('r2k', 0.2073),
('r2s', 0.1555),
('l2o', 0.2653),
('l2a', 0.272),
('l2r', 0.307),
('l2l', 0.7153),
('l2k', 0.1943),
('l2s', 0.1408),
('k2o', 0.295),
('k2a', 0.3263),
('k2r', 0.3167),
('k2l', 0.6807),
('k2k', 0.6813),
('k2s', 0.1501),
('s2o', 0.2963),
('s2a', 0.3083),
('s2r', 0.3037),
('s2l', 0.6733),
('s2k', 0.6253),
('s2s', 0.5511)]}
==> textvqa_accuracy | Final acc: [0.2963, 0.3083, 0.3037, 0.6733, 0.6253, 0.5511], weight avg acc: 0.45966666666666667.
==> textvqa_accuracy | Backward transfer: [-0.3034, -0.1094, -0.09059999999999996, -0.04200000000000004, -0.05600000000000005], weighted bwt: -0.12028000000000004
==> textvqa_accuracy | Forgetting: [0.41900000000000004, 0.40700000000000003, 0.4116, 0.04200000000000004, 0.09000000000000008], weighted forgetting: 0.27392.
-
For each
.yaml
config file under mmclvqa/EXP_CONFIG, change the path ofannotations
to where you put your annotation files. E.g.,annotations: train: - /your_path_to/fcl_mmf_attribute_train.npy val: - /your_path_to/fcl_mmf_attribute_val.npy test: - /your_path_to/fcl_mmf_attribute_val.npy
-
For each
.yaml
config file under mmclvqa/EXP_CONFIG, change the path ofvocab_file
to where you put your vocab_files(use the copy under files). E.g.,text_processor: type: bert_tokenizer params: max_length: 20 # change from 14 to 20 vocab: type: intersected embedding_name: glove.6B.300d vocab_file: /your_path_to/vocabulary_100k.txt ### scene_graph_processor: type: scene_graph_bert_tokenizer params: max_length: 480 vocab: type: intersected embedding_name: glove.6B.300d vocab_file: /your_path_to/vocabulary_100k.txt ### answer_processor: type: m4c_answer params: vocab_file: /your_path_to/clvqa_answer_6k.txt
-
Modify paths in mmclvqa/mmf/common/CL_constant.py:
DATA_DIR = dict( # modify path functional = "path to folder of function annotations", scene = "path to folder of scene annotations", ) # These files are under files/ GENERATED_SG_PTH = dict( functional = "/your_path_to/generated_sg_all_stages_v6.json", # modify path here scene = "/your_path_to/stage_sg_scene_setting_50u-50c.json", # modify path here )
-
For each
.yaml
config file under mmclvqa/EXP_CONFIG, you may change thecache_dir
where the program would save the automatically downloaded features.env: cache_dir: /workspace/stan/.cache/torch/mmf
-
Path for SRM replayed samples. When training SRM, you may specify
--model_dir_root [model_dir_root]
, the replayed samples will be saved under[model_dir_root]/[model_name]_replay/[model_name]_[setting_name]_[task_order]/
(automatically set to be used attraining.CL.restore_dir
for UniVQA CL training). -
You may change the training batch size for UniVQA by passing
training.batch_size=xxx
.
If you find our work helps, please cite our paper.
@article{Lei_symbolic_2023,
title={Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task},
volume={37},
url={https://ojs.aaai.org/index.php/AAAI/article/view/25208},
DOI={10.1609/aaai.v37i1.25208},
number={1},
journal={Proceedings of the AAAI Conference on Artificial Intelligence},
author={Lei, Stan Weixian and Gao, Difei and Wu, Jay Zhangjie and Wang, Yuxuan and Liu, Wei and Zhang, Mengmi and Shou, Mike Zheng},
year={2023},
month={Jun.},
pages={1250-1259}
}
For any questions, welcome to create an issue or email Stan (leiwx52@gmail.com).