Perception-Oriented Video Frame Interpolation via Asymmetric Blending 🔗
Guangyang Wu, Xin Tao, Changlin Li, Wenyi Wang, Xiaohong Liu, Qingqing Zheng
In CVPR 2024
This repository represents the official implementation of the paper titled "Perception-Oriented Video Frame Interpolation via Asymmetric Blending", also denoted as "PerVFI".
We present PerVFI, a novel paradigm for perception-oriented video frame interpolation.
- Asymmetric synergistic blending scheme: reduce blurry and ghosting effects derived from unavoidable motion error.
- Generative model as decoder: reconstruct results sampled from a distribution to resolve temporal supervision misalignment during training.
- Future: network structure can be meticulously optimized to improve efficiency and performance in the future.
2024-9-7: VFIBenchmark is released! Feel free to reproduce metrices listed in paper.
2024-6-13: Paper Accepted! . Release the inference code (this repository).
2024-6-1: Added arXiv version: .
- 🔜 Inference code for customized flow estimator.
- 🔜 Google Colab demo.
- 🔜 Online interactive demo.
- Hugging Face Space (optional).
- Add GIFs in page for better visualization.
We offer several ways to interact with PerVFI:
- Run the demo locally (requires a GPU and Anaconda, see Installation Guide). Local development instructions with this codebase are given below.
- Extended demo on Google Colab (coming soon).
- Online interactive demo (coming soon).
The inference code was tested on:
- Ubuntu 22.04 LTS, Python 3.10.12, CUDA 11.7, GeForce RTX 4090
- MacOS 14.2, Python 3.10.12, M1 16G
We recommend running the code in WSL2:
- Install WSL following installation guide.
- Install CUDA support for WSL following installation guide.
- Find your drives in
/mnt/<drive letter>/
; check WSL FAQ for more details. Navigate to the working directory of choice.
Clone the repository (requires git):
git clone https://github.com/mulns/PerVFI.git
cd PerVFI
We provide several ways to install the dependencies.
-
Using Conda.
Windows users: Install the Linux version into the WSL.
After the installation, create the environment and install dependencies into it:
conda env create -n pervfi conda activate pervfi pip install -r requirements.txt
-
Using pip: Alternatively, create a Python native virtual environment and install dependencies into it:
python -m venv venv/pervfi source venv/pervfi/bin/activate pip install -r requirements.txt
Keep the environment activated before running the inference script. Activate the environment again after restarting the terminal session.
Place your video images in a directory, for example, under input/in-the-wild_example
, and run the following inference command.
Download pre-trained models and place them to folder checkpoints
. This includes checkpoints for various optical flow estimators. You can choose one for simple use or all for comparison.
The Default checkpoint is trained only using Vimeo90K dataset.
cd src/test
python infer_video.py -m [OFE]+pervfi -data input -fps [OUT_FPS]
NOTE:
OFE
is a placeholder for optical flow estimator name. In this repo, we support RAFT, GMA, GMFlow. You can also use your preferred flow estimator (future feature).OUT_FPS
is a placeholder for frame rate (default to 10) of output video (maybe save with images).
The Vb checkpoint (faster) replaces the normalizing flow-generator with a multi-scale decoder to achieve faster inference speed, though with a compromise in perceptual quality:
cd src/test
python infer_video.py -m [OFE]+pervfi-vb -data input -fps [OUT_FPS]
You can find all results in output
. Enjoy!
Will be included in VFI-Benchmark.
Download training data from vimeo_90k.
Activate the previously created environment.
Set configuration parameters for the data directory within src/train/configs/pvfi-***.yml
Download pretrained Softmax-Splatting checkpoint, RAFT-sintel checkpoint and GMFlow-sintel checkpoint into src/train/checkpoints/
Run data preprocessing script to pre-estimate optical flows for each sample in dataset
cd src/train
python prepare_optical_flow.py -m raft -r path_to_data_dir -b 96
python prepare_optical_flow.py -m gmflow -r path_to_data_dir -b 96
Run training script
cd src/train
sh train.sh
Please refer to this instruction.
Please cite our paper:
@InProceedings{Wu_2024_CVPR,
author = {Wu, Guangyang and Tao, Xin and Li, Changlin and Wang, Wenyi and Liu, Xiaohong and Zheng, Qingqing},
title = {Perception-Oriented Video Frame Interpolation via Asymmetric Blending},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {2753-2762}
}
This work is licensed under the Apache License, Version 2.0 (as defined in the LICENSE).
By downloading and using the code and model you agree to the terms in the LICENSE.