Real-time Modeling and Tracking Manual Workflows from First-Person Vision

Real-time Modeling and Tracking Manual Workflows from First-Person Vision
Nils Petersen, Alain Pagani, Didier Stricker
Proceedings of the International Symposium on Mixed and Augmented Reality 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR-2013), October 1-4, Adelaide, South Australia, Australia

Abstract:
Recognizing previously observed actions in video sequences can lead to Augmented Reality manuals that (1) automatically follow the progress of the user and (2) can be created from video examples of the workflow. Modeling is challenging, as the environment is susceptible to change drastically due to user interaction and camera motion may not provide sufficient translation to robustly estimate geometry. We propose a piecewise homographic transform that projects the given video material onto a series of distinct planar subsets of the scene. These subsets are selected by segmenting the largest image region that is consistent with a homographic model and contains a given region of interest. We are then able to model the state of the environment and user actions using simple 2D region descriptors. The model elegantly handles estimation errors due to incomplete observation and is robust towards occlusions, e.g., due to the user's hands. We demonstrate the effectiveness of our approach quantitatively and compare it to the current state of the art. Further, we show how we apply the approach to visualize automatically assessed correctness criteria during run-time.

Real-time Modeling and Tracking Manual Workflows from First-Person Vision

Real-time Modeling and Tracking Manual Workflows from First-Person Vision
(Hrsg.)
Proceedings of the International Symposium on Mixed and Augmented Reality 2013 International Symposium on Mixed and Augmented Reality (ISMAR-18), October 1-4, Adelaide, South Australia, Australia

Abstract:
Recognizing previously observed actions in video sequences can lead to Augmented Reality manuals that (1) automatically follow the progress of the user and (2) can be created from video examples of the workflow. Modeling is challenging, as the environment is susceptible to change drastically due to user interaction and camera motion may not provide sufficient translation to robustly estimate geometry. We propose a piecewise homographic transform that projects the given video material onto a series of distinct planar subsets of the scene. These subsets are selected by segmenting the largest image region that is consistent with a homographic model and contains a given region of interest. We are then able to model the state of the environment and user actions using simple 2D region descriptors. The model elegantly handles estimation errors due to incomplete observation and is robust towards occlusions, e.g., due to the user's hands. We demonstrate the effectiveness of our approach quantitatively and compare it to the current state of the art. Further, we show how we apply the approach to visualize automatically assessed correctness criteria during run-time.