Learning Task Structure from Video Examples for Workflow Tracking and Authoring

Learning Task Structure from Video Examples for Workflow Tracking and Authoring
Nils Petersen, Didier Stricker
11th IEEE International Symposium on Mixed and Augmented Reality (ISMAR), 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR-2012), 11th, November 5-8, Atlanta, Georgia, USA

Abstract:
We present a robust real-time capable and simple framework for segmenting video sequences and live-streams of manual workflows into the comprising single tasks. Using classifiers trained on these segments we can follow a user that is performing the workflow in real-time as well as learn task variants from additional video examples. Our proposed method neither requires object detection nor high-level features. Instead we propose a novel measure derived from image distance that evaluates image properties jointly without prior segmentation. Our method can cope with repetitive and free-hand activities and the results are in many cases comparable or equal to manual task segmentation. One important application of our method is the automatic creation of a step-by-step task documentation from a video demonstration. The entire process to automatically create a fully functional augmented reality manual will be explained in detail and results are shown.

Learning Task Structure from Video Examples for Workflow Tracking and Authoring

Learning Task Structure from Video Examples for Workflow Tracking and Authoring
Nils Petersen, Didier Stricker
11th IEEE International Symposium on Mixed and Augmented Reality (ISMAR), 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR-2012), 11th, November 5-8, Atlanta, Georgia, USA

Abstract:
We present a robust real-time capable and simple framework for segmenting video sequences and live-streams of manual workflows into the comprising single tasks. Using classifiers trained on these segments we can follow a user that is performing the workflow in real-time as well as learn task variants from additional video examples. Our proposed method neither requires object detection nor high-level features. Instead we propose a novel measure derived from image distance that evaluates image properties jointly without prior segmentation. Our method can cope with repetitive and free-hand activities and the results are in many cases comparable or equal to manual task segmentation. One important application of our method is the automatic creation of a step-by-step task documentation from a video demonstration. The entire process to automatically create a fully functional augmented reality manual will be explained in detail and results are shown.