The official implementation for the ACCV 2024 paper [Depth Attention for Robust RGB Tracking]
Yu Liu, Arif Mahmood, Muhammad Haris Khan
RGB video object tracking is a fundamental task in computer vision. Its effectiveness can be improved using depth information, particularly for handling motion-blurred target. However, depth information is often missing in commonly used tracking benchmarks. In this work, we propose a new framework that leverages monocular depth estimation to counter the challenges of tracking targets that are out of view or affected by motion blur in RGB video sequences.
🔥🔥🔥
2024-10 🎉Our new challenging dataset NT-VOT211 is available now! Click the link on the right ⏪ to access our full tutorial for benchmarking on this new dataset.
🔥🔥🔥
To ensure the replication of the precise results detailed in the paper, it is crucial to match the software environment closely. Please configure your system with the following specifications:
- Python: Version 3.8.10
- PyTorch: Version 1.11.0, built with CUDA 11.3 support
- CUDA: Version 11.3
- NumPy: Version 1.22.3
- OpenCV: Version 4.8.0
By adhering to these versions, you will be able to achieve consistency with the experimental setup described in the publication.
To ensure the accuracy and consistency of the results as reported in our paper, it is essential to use the pretrained depth estimators that have been tested and validated with our algorithm. We have found the following models to be compatible and effective:
- Lite-Mono: This is the primary pretrained model we have used in our research. You can download it from the Lite-Mono GitHub repository.
- FastDepth: In addition to Lite-Mono, we have also confirmed that the FastDepth model can be used with our algorithm.
- Monodepth2: Another option that has been tested is the Monodepth2 model.
We recommend starting with the Lite-Mono model, as it has been extensively used in our experiments.
In our quest to create a comprehensive tracking solution, we have meticulously chosen a diverse array of baseline trackers, each with its own unique strengths:
- RTS: Engineered for rapid tracking, this system excels in real-time scenarios. Dive deeper
- AiATrack: A cutting-edge tracker that harnesses the power of artificial intelligence. Discover more
- ARTrack: Optimized for augmented reality, this tracker is a leader in its field. Find out more
- KeepTrack: Renowned for its steadfast reliability and precision across a spectrum of conditions. Get the details
- MixFormer: A versatile tracker that adapts to various tracking challenges. Check it out
- Neighbor: This tracker focuses on proximity-based tracking for enhanced accuracy. Explore here
- ODTrack: Designed for object detection and tracking in complex environments. Learn about it
- STMTrack: A tracker that offers a seamless tracking experience. Read more
Together, these trackers form a powerful toolkit, adept at handling a wide range of tracking tasks across diverse settings and scenarios.
To set up these trackers, please refer to the comprehensive tutorial.
If you find our work valuable, we kindly ask you to consider citing our paper and starring ⭐ our repository. Our implementation includes mutiple trackers and we hope it make life easier for the VOT research community and Depth Estimation community.
@inproceedings{liu2024depth,
title={Depth Attention for Robust RGB Tracking},
author={Yu Liu and Arif Mahmood and Muhammad Haris Khan},
booktitle={Proceedings of the Asian Conference on Computer Vision (ACCV)},
pages={to be announced},
year={2024},
organization={Springer}
}
Please open a GitHub issue for any help. If you have any questions regarding the technical details, feel free to contact us.