Learning 6DoF Object Poses from Synthetic Single Channel Images

Learning 6DoF Object Poses from Synthetic Single Channel Images
Jason Raphael Rambach, Chengbiao Deng, Alain Pagani, Didier Stricker
IEEE International Symposium on Mixed and Augmented Reality (ISMAR-2018), 17th, October 16-20, München, Germany

Estimation of 6DoF object poses from single images is a problem of great interest in augmented reality and robotics research since it enables interaction with the object or initialization of pose tracking. Approaches utilizing deep neural networks have shown good performance, however the majority of them rely on training on real images of the objects which can be challenging in terms of ground truth pose acquisition, scalability and full coverage of possible poses. In this paper, we disregard all depth and color information and train a CNN to directly regress 6DoF object poses using only synthetic single channel edge enhanced images. We evaluate our approach against the state-of-the-art using synthetic training images and show a significant improvement on the commonly used LINEMOD benchmark dataset.
Computer Vision, Object Pose Estimation, Augmented Reality, Deep Learning