Official implementation of our work SwissDINO, published in IROS2024. In this paper, we present a one-shot personal object search method based on the recent DINOv2 transformer model. Swiss DINO handles challenging on-device personalized scene understanding requirements and does not require any adaptation training.
Install conda environment with
$ conda env create -f swiss_dino_env.yml
We use two datasets for evaluation of the method: PerSeg (https://github.com/ZrrSkywalker/Personalize-SAM#preparation) and ICubWorld (https://robotology.github.io/iCubWorld/#icubworld-transformations-modal).
Download and extract a chosen dataset, and set $DATA_DIR
to the root dataset path.
To run evaluation on PerSeg dataset:
python swiss_dino_evaluation.py --dataset_name perseg --data_dir $DATA_DIR --fe_model_type vit_s --verbose
To run evaluation on ICubWorld dataset:
python swiss_dino_evaluation.py --dataset_name icubworld --data_dir $DATA_DIR --fe_model_type vit_s --verbose
The evaluation script generates log.txt
file with per-class metrics.
The inference is not supported yet.
If you use this repository, please cite our work
@article{paramonov2024swiss,
title={Swiss DINO: Efficient and Versatile Vision Framework for On-device Personal Object Search},
author={Paramonov, Kirill and Zhong, Jia-Xing and Michieli, Umberto and Moon, Jijoong and Ozay, Mete},
journal={IROS},
year={2024}
}