Learning Priors for Augmented Reality Tracking and Scene Understanding

Learning Priors for Augmented Reality Tracking and Scene Understanding
Jason Raphael Rambach
-Thesis Technische Universität Kaiserslautern ISBN 978-3-8439-4555-4 Dr.Hut München 9/2020 .

Abstract:
The great potential of Augmented Reality (AR) has started its realization in recent years. This recent surge came with an emergence of new challenges that upon successful completion will allow AR to reach technological maturity and become an essential part of everyday life. Even if pose tracking is a solved problem in controlled environments, challenges beyond that remain concerning geometric and semantic scene understanding. Dense mapping and semantic labeling of the environment is required for the creation of meaningful virtual content that is able to fully interact with the real world. When seen at a system level, scalability in the sense of device accessibility and content generation is the current main challenge for AR. In this thesis we provide novel solutions to several current challenges of AR concerning tracking, mapping, and applications. We base these solutions on generating prior knowledge using machine learning techniques. Deep Learning has already superseded traditional computer vision in areas such as classification but is challenged in 3D estimation problems. In this work we advocate a combination of initial hypotheses or prior estimates provided by Deep Learning with traditional computer vision to achieve refined and robust results. Thus, we propose visual-inertial tracking using a learned model of the inertial sensor fused with a visual feature tracker. We address the need for efficient dense mapping with a piece-wise planar SLAM system using Deep Learning for planar area hypotheses generation, fused incrementally with a point cloud reconstructed using multiple view geometry. Furthermore, we show that initial objects poses can be efficiently estimated through learning and can subsequently be refined and tracked using feature based methods. The use of synthetic images as a reference or as machine learning training data offers significant advantages but also presents us with the Synthetic-to-Real representation gap problem for which we introduce novel solutions. Finally, we investigate the factors that hinder AR applications from achieving mass adoption. We consider these to be the lack of universality and semantic relevance. Therefore, we propose a concept of objects that store and share their own AR experiences and tracking information in order to decouple tracking applications from target objects and content generation, a web-based AR tracking framework that is independent of device type and operating system and an edge computing architecture for remote processing that enables devices of limited computational capabilities to support AR applications.