The official code for [ACM MM 2022] 'In-N-Out Generative Learning for Dense Unsupervised Video Segmentation'. [arXiv]
We achieve a new state-of-the-art performance for unsupervised learning methods on VOS task, based on ViT and the idea of generative learning.
We test with:
- python==3.7
- pytorch==1.7.1
- CUDA==10.2
We train on Charades with 4x16GB V100 and Kinetics-400 with 8x16GB V100. The training takes around 12h and 1week, respectively. The codebase is implemented based on DINO, DUL, and VRW.
We use charades_480p and Kinetics-400 for training.
After downloading datasets, run:
git clone git@github.com:pansanity666/INO_VOS.git
cd INO_VOS
mkdir ./data
ln -s /your/path/Charades_v1_480 ./data
ln -s /your/path/Kinetics_400 ./data
We benchmark on DAVIS-2017 val and YouTube-VOS 2018 val.
Download DAVIS-2017 from here.
Download YouTube-VOS 2018 (valid_all_frames.zip and valid.zip) from here.
Link them to ./data
(similar as previous datasets).
The final structure of data
folder should be:
-data
-Charades_v1_480
- xxxx.mp4
- ...
-Kinetics_400
- xxxx.mp4
- ...
-DAVIS
- Annotations
- JPEGImages
- ...
-YouTube_VOS
- valid_all_frames
- valid
Set the ckpt_output_path
in train_charades.sh
as you need and then run
# under INO_VOS dir
sh train_charades.sh
The dataset meta will be cached under ./cached/charades
at the first run (it may take few minutes.).
Same for training on Kinetics-400.
Our checkpoint used in the paper can be downloaded from here.
For the sake of efficiency, we first pre-generate the neighbor masks used during label propagation and cache them on disk.
python ./scripts/pre_calc_maskNeighborhood.py [davis|ytvos]
It may take few minutes, and the neighbor masks will be cached under ./cached/masks
by default.
Then, run label propagation via:
sh infer_vos.sh [davis|ytvos] $CKPT_PATH
Two folders will be created under ./results
, where vos
is the segmentation masks while vis
is the blended visualization results.
Please install the official evaluation code and evaluate the inference results:
# under INO_VOS dir
git clone https://github.com/davisvideochallenge/davis2017-evaluation ./davis2017-evaluation
python ./davis2017-evaluation/evaluation_method.py --task semi-supervised --results_path $OUTPUT_VOS --davis_path ./data/DAVIS/
Please use the official CodaLab evaluation server.
To create the submission, rename the vos
-directory to Annotations
and compress it to Annotations.zip
for uploading.
If you find our work useful, please consider citing:
@inproceedings{pan2022n,
title={In-n-out generative learning for dense unsupervised video segmentation},
author={Pan, Xiao and Li, Peike and Yang, Zongxin and Zhou, Huiling and Zhou, Chang and Yang, Hongxia and Zhou, Jingren and Yang, Yi},
booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
pages={1819--1827},
year={2022}
}