Fully Automatic Multi-person Human Motion Capture for VR Applications

Fully Automatic Multi-person Human Motion Capture for VR Applications
Ahmed Elhayek, Onorina Kovalenko, Pramod Murthy, Muhammad Jameel Nawaz Malik, Didier Stricker
EuroVR (EuroVR-2018), October 22-23, London, United Kingdom

Abstract:
Fully automatic tracking of articulated motion in real-time with monocular RGB camera is a challenging problem which is essential for many virtual reality (VR) applications. In this paper, we propose a novel temporally stable solution for this problem which can be directly employed in VR practical applications. Our algorithm automatically estimates the number of persons in the scene, generates their corresponding person specific 3D skeletons, and estimates their initial 3D locations. For every frame, it fits each 3D skeleton to the corresponding 2D body-parts locations which are estimated with one of the existing CNN-based 2D pose estimation methods. The 3D pose of every person is estimated by maximizing an objective function that combines a skeleton fitting term with motion and pose priors. Our algorithm detects persons who enter or leave the scene, and dynamically generates or deletes their 3D skeletons. This makes our algorithm the first monocular RGB method usable in real-time applications such as dynamically including multiple persons in a virtual environment using the camera of the VR-headset. We show that our algorithm is applicable for tracking multiple persons in outdoor scenes, community videos and low quality videos captured with mobile-phone cameras.
Keywords:
Human motion capture, Convolutional neural network, anthropometric data