AsymLoc: Towards Asymmetric Feature Matching for Efficient Visual Localization

AI in healthcare

Published: arXiv: 2604.09445v1

Authors

Mohammad Omama Gabriele Berton Eric Foxlin Yelin Kim

Abstract

Precise and real-time visual localization is critical for applications like AR/VR and robotics, especially on resource-constrained edge devices such as smart glasses, where battery life and heat dissipation can be a primary concerns. While many efficient models exist, further reducing compute without sacrificing accuracy is essential for practical deployment. To address this, we propose asymmetric visual localization: a large Teacher model processes pre-mapped database images offline, while a lightweight Student model processes the query image online. This creates a challenge in matching features from two different models without resorting to heavy, learned matchers. We introduce AsymLoc, a novel distillation framework that aligns a Student to its Teacher through a combination of a geometry-driven matching objective and a joint detector-descriptor distillation objective, enabling fast, parameter-less nearest-neighbor matching. Extensive experiments on HPatches, ScanNet, IMC2022, and Aachen show that AsymLoc achieves up to 95% of the teacher's localization accuracy using an order of magnitude smaller models, significantly outperforming existing baselines and establishing a new state-of-the-art efficiency-accuracy trade-off.

Paper Summary

Problem

Visual localization is a critical task in applications like augmented reality (AR/VR) and robotics. It involves estimating a precise 6-DoF camera pose from a pre-mapped image database using only visual input. However, this task is challenging, especially on resource-constrained edge devices such as smart glasses, where battery life and heat dissipation can be a primary concern.

Key Innovation

The researchers propose a novel approach called AsymLoc, which involves using two separate models: a larger teacher model that processes the database images offline and a smaller student model that runs online and produces outputs consistent with the teacher. The key innovation lies in the distillation framework, which aligns the student to the teacher through a combination of geometric and probabilistic supervision.

Practical Impact

AsymLoc has the potential to revolutionize visual localization on edge devices. By using a smaller student model, inference costs can be reduced by up to 25 times, making it possible to deploy visual localization frameworks on devices with limited computational resources. This can enable applications like AR/VR and robotics to run efficiently on edge devices, paving the way for new use cases and industries.

Analogy / Intuitive Explanation

Imagine you're trying to recognize a friend in a crowded room. You have a friend's photo from a previous occasion, and you're trying to match it with a new photo taken in the same room. A traditional approach would involve comparing the two photos pixel by pixel, which can be computationally expensive. AsymLoc, on the other hand, uses a smaller model to quickly identify the most likely location of the friend in the new photo, and then uses a larger model to refine the result. This approach allows for faster and more efficient matching, making it ideal for real-time applications like visual localization.

Paper Information

Categories:

cs.CV

Published Date:

arXiv ID:

2604.09445v1

Quick Actions

Back to Home