Create a new directory data
to store all the datasets.
Download the dataset from the official website COCO.
RefCOCO/+/g use the COCO2014 train split.
Download the annotation files from github.
Convert the annotation files:
python3 tools/data/convert_refexp_to_coco.py
Finally, we expect the directory structure to be the following:
ReferFormer
├── data
│ ├── coco
│ │ ├── train2014
│ │ ├── refcoco
│ │ │ ├── instances_refcoco_train.json
│ │ │ ├── instances_refcoco_val.json
│ │ ├── refcoco+
│ │ │ ├── instances_refcoco+_train.json
│ │ │ ├── instances_refcoco+_val.json
│ │ ├── refcocog
│ │ │ ├── instances_refcocog_train.json
│ │ │ ├── instances_refcocog_val.json
Download the dataset from the competition's website here. Then, extract and organize the file. We expect the directory structure to be the following:
ReferFormer
├── data
│ ├── ref-youtube-vos
│ │ ├── meta_expressions
│ │ ├── train
│ │ │ ├── JPEGImages
│ │ │ ├── Annotations
│ │ │ ├── meta.json
│ │ ├── valid
│ │ │ ├── JPEGImages
Downlaod the DAVIS2017 dataset from the website. Note that you only need to download the two zip files DAVIS-2017-Unsupervised-trainval-480p.zip
and DAVIS-2017_semantics-480p.zip
.
Download the text annotations from the website.
Then, put the zip files in the directory as follows.
ReferFormer
├── data
│ ├── ref-davis
│ │ ├── DAVIS-2017_semantics-480p.zip
│ │ ├── DAVIS-2017-Unsupervised-trainval-480p.zip
│ │ ├── davis_text_annotations.zip
Unzip these zip files.
unzip -o davis_text_annotations.zip
unzip -o DAVIS-2017_semantics-480p.zip
unzip -o DAVIS-2017-Unsupervised-trainval-480p.zip
Preprocess the dataset to Ref-Youtube-VOS format. (Make sure you are in the main directory)
python tools/data/convert_davis_to_ytvos.py
Finally, unzip the file DAVIS-2017-Unsupervised-trainval-480p.zip
again (since we use mv
in preprocess for efficiency).
unzip -o DAVIS-2017-Unsupervised-trainval-480p.zip
Follow the instructions and download the dataset from the website here. Then, extract the files. Additionally, we use the same json annotation files generated by MTTR. Please download these files from onedrive. We expect the directory structure to be the following:
ReferFormer
├── data
│ ├── a2d_sentences
│ │ ├── Release
│ │ ├── text_annotations
│ │ │ ├── a2d_annotation_with_instances
│ │ │ ├── a2d_annotation.txt
│ │ │ ├── a2d_missed_videos.txt
│ │ ├── a2d_sentences_single_frame_test_annotations.json
│ │ ├── a2d_sentences_single_frame_train_annotations.json
│ │ ├── a2d_sentences_test_annotations_in_coco_format.json
Follow the instructions and download the dataset from the website here. Then, extract the files. Additionally, we use the same json annotation files generated by MTTR. Please download these files from onedrive. We expect the directory structure to be the following:
ReferFormer
├── data
│ ├── jhmdb_sentences
│ │ ├── Rename_Images
│ │ ├── puppet_mask
│ │ ├── jhmdb_annotation.txt
│ │ ├── jhmdb_sentences_samples_metadata.json
│ │ ├── jhmdb_sentences_gt_annotations_in_coco_format.json