Self-localization from Images with Small Overlap

Abstract-With the recent success of visual features fromdeep convolutional neural networks (DCNN) in visual robot selflocalization,it has become important and practical to address more general self-localization scenarios. In this paper, we address the scenario of self-localization from images with small overlap. We explicitly introduce a localization difficulty index as a decreasing function of view overlap between query and relevant database images and investigate performance versus difficulty for challenging cross-view self-localization tasks. We then reformulate the self-localization as a scalable bag-ofvisual- features (BoVF) scene retrieval and present an efficient solution called PCA-NBNN, aiming to facilitate fast and yet discriminative correspondence between partially overlapping images. The proposed approach adopts recent findings in discriminativity preserving encoding of DCNN features using principal component analysis (PCA) and cross-domain scene matching using naive Bayes nearest neighbor distance metric (NBNN). We experimentally demonstrate that the proposed PCA-NBNN framework frequently achieves comparable results to previous DCNN features and that the BoVF model is significantly more efficient. We further address an important alternative scenario of gself-localization from images with NO overlaph and report the result.

Fig. 1. Self-localization with different levels of localization difficulty index (LDI). The LDI of a self-localization task is a decreasing function of view overlap between the query and relevant database image pair. In experiments,we employ SIFT matching with VFC verification (colored line segments) to evaluate the amount of view overlap. All the pairs in the dataset are evaluated and sorted according in ascending order of LDI. Rank in the sorted list (normalized by the listfs length) [%] can be viewed as a prediction of relative difficulty of the corresponding self-localization task. Displayed in figures are samples from self-localization tasks with four different levels of ranks [%].

Fig. 2. Experimental environments. Red, yellow, and green lines: viewpoint paths on which dataset #1, #2, and #3 were collected.

Fig. 3. Sample configurations of viewpoints for different levels of localization difficulties.

Fig. 4. Compact binary landmarks. a, b, c, and d: 4 different examples of a query image (top) being explained by one image-level feature and 20 part-level features (bottom). Each scene part is further encoded to a 128-bit binary code, which is visualized by a barcode.

Fig. 5. Effect of asymmetric distance computation (ADC). The figures compare the two different encoding schemes, BoW (top) and ADC (bottom), using a toy example of a 2D feature space x-y, in the case of the fine library. In the figures, query/database images are located z=2/z=??2, local features extracted from query/database images are located z = 1/z = ??1, and library features (green dots) including NN library features (colored small boxes) are located z = 0. Previous BoW systems (top), which encode both query and database features, frequently fail to identify common library features between query and database images in the case of our fine library. Conversely, ADC, which encodes only database features, not query features, is stable to identify NN library features of individual database features by an online search over the space of library features (i.e., z = 0).

Fig. 6. Performance vs. difficulty. Vertical axis: ratio of self-localization tasks where the ground truth image pair is top-X ranked for ten different ranges of rank X (X: 0.0-0.1, 0.1-0.2, ... , 0.9-1.0.). Horizontal axis: view overlap in terms of number of VFC matches, which is a decreasing function of localization difficulty index.

Fig. 7. Samples of self-localization tasks. Displayed in figures are samples of self-localization tasks (using gbodw20h algorithm). We uniformly sampled them from the experiments. For each sample, its query image (left) and the relevant database image (right) are displayed with the view overlap score (goverlaph) as well as the localization performance (grankh). Here, grankh is the rank assigned to the ground-truth database relevant image, within a ranked list output by the recognition algorithm. From top to bottom, left to right, these samples are displayed in descending order of view overlap (i.e., from easiest to hardest).

Fig. 8. Localization performance on relatively easy localization scenarios.

Fig. 9. Localization performance on relatively hard localization scenarios.

Members @@@@Tanaka Kanji, Murase Tomoya, Yanagihara Kentaro

Cross View Localization dataset

The cross season dataset consists of around 15,000 images taken around a university campus, using a Bumblebee stereo camera. The viewpoint trajectory has been estimated using a stereo visual odometry and saved in a file vo.txth. The estimated trajectory has been further corrected by a graph SLAM algorithm and saved in is.txth.

Description of files:

cross_view_localization/
1/
img_dir/*pgm
is.txt
vo.txt

2/
img_dir/*pgm
is.txt
vo.txt

3/
img_dir/*pgm
is.txt
vo.txt

Each line of is.txth and vo.txth consists of [x] [y] [\theta].

Relevant Publication:

Kanji Tanaka
Self-localization from images with small overlap
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2016)
Bibtex source, Abstract, Document PDF

Acknowledgements:

This work is supported in part by JSPS KAKENHI Grant-in-Aid for Young Scientists (B) 23700229, and for Scientific Research (C) 26330297.