This paper has won the Best Student Paper Award in IEVC 2021.
Abstract: Due to the increasing complexity of indoor facilities such as shopping malls and train stations, there is a need for a technology that can find the current location of a user using a smartphone or other devices, even in indoor areas where GPS signals cannot be received. Indoor localization methods based on image recognition have been proposed as solutions. While many localization methods have been proposed for outdoor use, indoor localization has difficultly in achieving high accuracy from just one image taken by the user (query image), because there are many similar objects (walls, desks, etc.) and there are only a few cues that can be used for localization. In this paper, we propose a novel indoor localization method that uses multi-view images. The basic idea is to improve the localization quality by retrieving the pre-captured image with location information (reference image) that best matches the multi-view query image taken from multiple directions around the user. To this end, we introduce a simple metric to evaluate the distance between multi-view images.
Keywords: indoor localization, multi-view image, image recognition, similar image search, GeM pooling[1]
- TUS Library Dataset:
TUS Library Dataset is our proprietary dataset: it is a set of images taken at the Tokyo University of Science (TUS) Katsushika Campus Library (floor area: 3,358 m²). We captured reference images at 159 locations × 4 directions (636 images in total) taken at about 1[m] intervals by an iPhoneSE. Query images of 42 locations × 4 directions (168 images in total) were taken at random locations with an iPhone8Plus. All the images had size of 480×640[px].
You can download it from here. Put the image data underdataset/library/
.
- Python 3.8.5
- PyTorch 1.8.0+cu111
- ResNet152 (trained with whitening included in Google-Landmarks-2018)
python multi_library.py
Evaluation Metrics:
The percentage of query images where the distances between the estimated location and the ground truth location are within 1[m] is reported as One-Meter-Level Accuracy.
[1] Filip Radenović, et al. Fine-Tuning CNN Image Retrieval with No Human Annotation. TPAMI, Vol. 41, No. 7, pp. 1655–1668, 2019.
- Xinyun Li (Tokyo University of Science),Ryosuke Furuta (The University of Tokyo),Go Irie (NTT Communication Science Laboratories),and Yukinobu Taniguchi (Tokyo University of Science),“Accurate Indoor Localization Using Multi-View Images and Generalized Mean Pooling”, IIEEJ, pp.107-111, 2020.
- Xinyun Li, Ryosuke Furuta, Go Irie, and Yukinobu Taniguchi, "Accurate Indoor Localization Using Multi-View Image Distance", IEVC, 3A-2, 2021. (Best Student Paper Award)