🚀 MonSter (CVPR 2025) 🚀

Official PyTorch implementation of MonSter
MonSter: Marry Monodepth to Stereo Unleashes Power
Junda Cheng, Longliang Liu, Gangwei Xu, Xianqi Wang, Zhaoxing Zhang, Yong Deng, Jinliang Zang, Yurui Chen, Zhipeng Cai, Xin Yang

🌼 Abstract

MonSter represents an innovative approach that effectively harnesses the complementary strengths of monocular depth estimation and stereo matching, thereby fully unlocking the potential of stereo vision. This method significantly enhances the depth perception performance of stereo matching in challenging regions such as ill-posed areas and fine structures. Notably, MonSter ranks first across five of the most widely used leaderboards, including SceneFlow, KITTI 2012, KITTI 2015, Middlebury, and ETH3D. Additionally, in terms of zero-shot generalization, MonSter also significantly and consistently outperforms state-of-the-art methods, making it the current model with the best accuracy and generalization capabilities.

🌈: Zero-shot performance on KITTI

Zero-shot generalization performance on the KITTI benchmark.

video demo

🎨 Zero-shot performance on the wild captured stereo images

Zero-shot generalization performance on our captured stereo images.

📝 Benchmarks performance

Comparisons with state-of-the-art stereo methods across five of the most widely used benchmarks.

⚙️ Installation

NVIDIA RTX 3090
python 3.8

⏳ Create a virtual environment and activate it.

conda create -n monster python=3.8
conda activate monster

🎬 Dependencies

pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
pip install tqdm
pip install scipy
pip install opencv-python
pip install scikit-image
pip install tensorboard
pip install matplotlib 
pip install timm==0.6.13
pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu118/torch2.1/index.html
pip install accelerate==1.0.1
pip install gradio_imageslider
pip install gradio==4.29.0

✏️ Required Data

✈️ Model weights

Model	Link
KITTI (one model for both 2012 and 2015)	Download 🤗
Middlebury	Download 🤗
ETH3D	Download 🤗
sceneflow	Download 🤗
mix_all (mix of all datasets)	Download 🤗

The mix_all model is trained on all the datasets mentioned above, which has the best performance on zero-shot generalization.

✈️ Evaluation

To evaluate the zero-shot performance of MonSter on Scene Flow, KITTI, ETH3D, vkitti, DrivingStereo, or Middlebury, run

python evaluate_stereo.py --restore_ckpt ./pretrained/sceneflow.pth --dataset *(select one of ["eth3d", "kitti", "sceneflow", "vkitti", "driving"])

or use the model trained on all datasets, which is better for zero-shot generalization.

python evaluate_stereo.py --restore_ckpt ./pretrained/mix_all.pth --dataset *(select one of ["eth3d", "kitti", "sceneflow", "vkitti", "driving"])

✈️ Submission

For MonSter submission to the KITTI benchmark, run

python save_disp.py

For MonSter submission to the Middlebury benchmark, run

python save_pfm.py

For MonSter submission to the ETH3D benchmark, run

python save_pfm_eth.py

✈️ Training

To train MonSter on Scene Flow or KITTI or ETH3D or Middlebury, run

CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch train_kitti.py   (for KITTI)
CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch train_eth3d.py   (for ETH3D)
CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch train_sceneflow.py   (for Scene Flow)
CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch train_middlebury.py   (for Middlebury)

✈️ Citation

If you find our works useful in your research, please consider citing our papers:

@article{cheng2025monster,
  title={MonSter: Marry Monodepth to Stereo Unleashes Power},
  author={Cheng, Junda and Liu, Longliang and Xu, Gangwei and Wang, Xianqi and Zhang, Zhaoxing and Deng, Yong and Zang, Jinliang and Chen, Yurui and Cai, Zhipeng and Yang, Xin},
  journal={arXiv preprint arXiv:2501.08643},
  year={2025}
}

Acknowledgements

This project is based on RAFT-Stereo, GMStereo, and IGEV. We thank the original authors for their excellent works.

Name	Name	Last commit message	Last commit date
Latest commit Junda24 Update train_sceneflow.yaml Mar 12, 2025 68bc318 · Mar 12, 2025 History 26 Commits
Depth-Anything-V2-list3/depth_anything_v2	Depth-Anything-V2-list3/depth_anything_v2	readme	Mar 2, 2025
config	config	Update train_sceneflow.yaml	Mar 12, 2025
core	core	readme	Mar 3, 2025
media	media	readme	Mar 3, 2025
README.md	README.md	Update README.md	Mar 3, 2025
demo_video.py	demo_video.py	readme	Mar 3, 2025
evaluate_stereo.py	evaluate_stereo.py	Update evaluate_stereo.py	Mar 7, 2025
save_disp.py	save_disp.py	initial version	Mar 2, 2025
save_pfm.py	save_pfm.py	initial version	Mar 2, 2025
save_pfm_eth.py	save_pfm_eth.py	initial version	Mar 2, 2025
train_eth3d.py	train_eth3d.py	initial version	Mar 2, 2025
train_kitti.py	train_kitti.py	initial version	Mar 2, 2025
train_middlebury.py	train_middlebury.py	readme	Mar 3, 2025
train_sceneflow.py	train_sceneflow.py	readme	Mar 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 MonSter (CVPR 2025) 🚀

🌼 Abstract

🌈: Zero-shot performance on KITTI

🎨 Zero-shot performance on the wild captured stereo images

📝 Benchmarks performance

⚙️ Installation

⏳ Create a virtual environment and activate it.

🎬 Dependencies

✏️ Required Data

✈️ Model weights

✈️ Evaluation

✈️ Submission

✈️ Training

✈️ Citation

Acknowledgements

About

Releases

Packages

Contributors 3

Languages

Junda24/MonSter

Folders and files

Latest commit

History

Repository files navigation

🚀 MonSter (CVPR 2025) 🚀

🌼 Abstract

🌈: Zero-shot performance on KITTI

🎨 Zero-shot performance on the wild captured stereo images

📝 Benchmarks performance

⚙️ Installation

⏳ Create a virtual environment and activate it.

🎬 Dependencies

✏️ Required Data

✈️ Model weights

✈️ Evaluation

✈️ Submission

✈️ Training

✈️ Citation

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages