Self-localization
from Images with Small Overlap
With the recent success of visual features from deep convolutional neural
networks (DCNN) in visual robot self-localization, it has become important and practical
to address more general self-localization scenarios. In this paper, we address
the scenario of self-localization from images with small overlap. We explicitly
introduce a localization difficulty index as a decreasing function of view
overlap between query and relevant database images and investigate performance
versus difficulty for challenging cross-view self-localization tasks. We
then reformulate the self-localization as a scalable
bag-of-visual-features (BoVF) scene retrieval and present an efficient solution
called PCA-NBNN, aiming to facilitate fast and yet discriminative
correspondence between partially overlapping images. The proposed approach
adopts recent findings in discriminativity preserving encoding of DCNN features
using principal component analysis (PCA) and cross-domain scene matching using
naive Bayes nearest neighbor distance metric (NBNN). We experimentally
demonstrate that the proposed PCA-NBNN framework frequently achieves comparable
results to previous DCNN features and that the BoVF model is significantly more
efficient. We further address an important alternative scenario of
gself-localization from images with NO overlaph and report the result.
Members: Kanji Tanaka, Tomoya Murase
Relevant Publication:
Self-localization from images with small overlap
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2016)
Kanji Tanaka
Bibtex source, Document PDF
Acknowledgements: This work is supported in part by JSPS KAKENHI
Grant-in-Aid for Young Scientists (B) 23700229, and for Scientific Research (C)
26330297.
Fig. 1. Self-localization with different levels of localization difficulty
index
(LDI). The LDI of a self-localization task is a decreasing function of
view
overlap between the query and relevant database image pair. In
experiments,
we employ SIFT matching with VFC verification (colored line segments)
to evaluate the amount of view overlap. All the pairs in the dataset are
evaluated and sorted according in ascending order of LDI. Rank in the
sorted
list (normalized by the listfs length) [%] can be viewed as a prediction
of
relative difficulty of the corresponding self-localization task. Displayed
in
figures are samples from self-localization tasks with four different
levels of
ranks [%].
Fig. 2. Experimental environments. Red, yellow, and green lines: viewpoint
paths on which
dataset #1, #2, and #3 were collected.
Fig. 3. Sample configurations of viewpoints for different levels of localization difficulties.
Fig. 4. Compact binary landmarks. a, b, c, and d: 4 different examples of
a query image (top) being explained by one image-level feature and 20
part-level
features (bottom). Each scene part is further encoded to a 128-bit binary code, which is visualized by a barcode.
Fig. 5. Effect of asymmetric distance computation (ADC). The figures compare the two different encoding schemes, BoW (top) and ADC (bottom), using a toy example of a 2D feature space x-y, in the case of the fine library. In the figures, query/database images are located z=2/z=2, local features extracted from query/database images are located z = 1/z = 1, and library features (green dots) including NN library features (colored small boxes) are located z = 0. Previous BoW systems (top), which encode both query and database features, frequently fail to identify common library features between query and database images in the case of our fine library. Conversely, ADC, which encodes only database features, not query features, is stable to identify NN library features of individual database features by an online search over the space of library features (i.e., z = 0).
Fig. 6. Performance vs. difficulty. Vertical axis: ratio of self-localization
tasks where the ground truth image pair is top-X ranked for ten different ranges of
rank X (X: 0.0-0.1, 0.1-0.2, ... , 0.9-1.0.).
Horizontal axis: view overlap in terms of number of VFC matches, which is a
decreasing function of localization difficulty index.
Fig. 7. Samples of self-localization tasks. Displayed in figures are samples of self-localization tasks (using gbodw20h algorithm). We uniformly sampled them from the experiments. For each sample, its query image (left) and the relevant database image (right) are displayed with the view overlap score (goverlaph) as well as the localization performance (grankh). Here, grankh is the rank assigned to the ground-truth database relevant image, within a ranked list output by the recognition algorithm. From top to bottom, left to right, these samples are displayed in descending order of view overlap (i.e., from easiest to hardest).
Fig. 8. Localization performance on relatively easy localization scenarios.
Fig. 9. Localization performance on relatively hard localization scenarios.
Members @@@@Tanaka Kanji, Murase Tomoya, Yanagihara Kentaro
Cross View Localization dataset
The cross season dataset consists of around 15,000 images taken around a
university campus, using a Bumblebee stereo camera. The viewpoint trajectory
has been estimated using a stereo visual odometry and saved in a file ``vo.txth. The estimated trajectory has been further corrected by a graph
SLAM algorithm and saved in ``is.txth.
DOWNLOAD: cross_view_localization_dataset.zip
Description of files:
cross_view_localization/
1/
img_dir/*pgm
is.txt
vo.txt
2/
img_dir/*pgm
is.txt
vo.txt
3/
img_dir/*pgm
is.txt
vo.txt
Each line of ``is.txth and ``vo.txth consists of [x] [y] [\theta].