Abstract: The
problem of semantic segmentation from depth images can be addressed by segmenting
directly in the image domain or at 3D point cloud level. In this paper, we
attempt for the first time to provide a study and experimental comparison of the
two approaches. Through experiments on three datasets, namely SUN RGB-D, NYUdV2
and TICaM, we extensively compare various semantic segmentation algorithms, the
input to which includes images and point clouds derived from them. Based on
this, we offer analysis of the performance and computational cost of these
algorithms that can provide guidelines on when each method should be preferred.
We are proud that our paper “RPSRNet: End-to-End Trainable Rigid Point Set Registration Network using Barnes-Hut 2^D-Tree Representation” has been accepted for publication at the Computer Vision Pattern Recognition (CVPR) 2021 Conference, which will take place virtually online from June 19th to 25th. CVPR is the premier annual computer vision conference. Our paper was accepted from ~12000 submissions as one of 23.4% (acceptance rate: 23.4%).
Abstract: We propose RPSRNet – a novel end-to-end trainable deep neural network for rigid point set registration. For this task, we use a novel 2^D-tree representation for the input point sets and a hierarchical deep feature embedding in the neural network. An iterative transformation refinement module of our network boosts the feature matching accuracy in the intermediate stages. We achieve an inference speed of ~12-15$\,$ms to register a pair of input point clouds as large as ~250K. Extensive evaluations on (i) KITTI LiDAR-odometry and (ii) ModelNet-40 datasets show that our method outperforms prior state-of-the-art methods – e.g., on the KITTI dataset, DCP-v2 by 1.3 and 1.5 times, and PointNetLK by 1.8 and 1.9 times better rotational and translational accuracy respectively. Evaluation on ModelNet40 shows that RPSRNet is more robust than other benchmark methods when the samples contain a significant amount of noise and disturbance. RPSRNet accurately registers point clouds with non-uniform sampling densities, e.g., LiDAR data, which cannot be processed by many existing deep-learning-based registration methods.
Abstract: This article introduces a new physics-based method for rigid point set alignment called Fast Gravitational Approach (FGA). In FGA, the source and target point sets are interpreted as rigid particle swarms with masses interacting in a globally multiply-linked manner while moving in a simulated gravitational force field. The optimal alignment is obtained by explicit modeling of forces acting on the particles as well as their velocities and displacements with second-order ordinary differential equations of n-body motion. Additional alignment cues can be integrated into FGA through particle masses. We propose a smooth-particle mass function for point mass initialization, which improves robustness to noise and structural discontinuities. To avoid the quadratic complexity of all-to-all point interactions, we adapt a Barnes-Hut tree for accelerated force computation and achieve quasilinear complexity. We show that the new method class has characteristics not found in previous alignment methods such as efficient handling of partial overlaps, inhomogeneous sampling densities, and coping with large point clouds with reduced runtime compared to the state of the art. Experiments show that our method performs on par with or outperforms all compared competing deep-learning-based and general-purpose techniques (which do not take training data) in resolving transformations for LiDAR data and gains state-of-the-art accuracy and speed when coping with different data.
In the frame of a research cooperation, DFKI’s Augmented Vision Department and BMW are working jointly on Augmented Reality for In-Car applications. Ahmet Firintepe, a BMW research PhD under the supervision of Dr. Alain Pagani and Prof. Didier Stricker has recently published two papers on outside-in head and glass pose estimation:
In this paper, we present a study on single and multi-view image-based AR glasses pose estimation with two novel methods. The first approach is named GlassPose and is a VGG-based network. The second approach GlassPoseRN is based on ResNet18. We train and evaluate the two custom developed glasses pose estimation networks with one, two and three input images on the HMDPose dataset. We achieve errors as low as 0.10 degrees and 0.90 mm on average on all axes for orientation and translation. For both networks, we observe minimal improvements in position estimation with more input views.
In this paper, we propose two novel AR glasses pose estimation algorithms from single infrared images by using 3D point clouds as an intermediate representation. Our first approach “PointsToRotation” is based on a Deep Neural Network alone, whereas our second approach “PointsToPose” is a hybrid model combining Deep Learning and a voting-based mechanism. Our methods utilize a point cloud estimator, which we trained on multi-view infrared images in a semisupervised manner, generating point clouds based on one image only. We generate a point cloud dataset with our point cloud estimator using the HMDPose dataset, consisting of multi-view infrared images of various AR glasses with the corresponding 6-DoF poses. In comparison to another point cloud-based 6-DoF pose estimation named CloudPose, we achieve an error reduction of around 50%. Compared to a state-of-the-art image-based method, we reduce the pose estimation error by around 96%.
Abstract: Virtual Reality (VR) technology offers users the possibility to immerse and freely navigate through virtual worlds. An important component for achieving a high degree of immersion in VR is locomotion. Often discussed in the literature, a natural and effective way of controlling locomotion is still a general problem which needs to be solved. Recently, VR headset manufacturers have been integrating more sensors, allowing hand or eye tracking without any additional required equipment. This enables a wide range of application scenarios with natural freehand interaction techniques where no additional hardware is required. This paper focuses on techniques to control teleportation-based locomotion with hand gestures, where users are able to move around in VR using their hands only. With the help of a comprehensive study involving 21 participants, four different techniques are evaluated. The effectiveness and efficiency as well as user preferences of the presented techniques are determined. Two two-handed and two one-handed techniques are evaluated, revealing that it is possible to move comfortable and effectively through virtual worlds with a single hand only.
As part of the research activities of DFKI Augmented Vision in the VIZTA project (https://www.vizta-ecsel.eu/), we have published the open-source dataset for automotive in-cabin monitoring with a wide-angle time-of-flight depth sensor. The TiCAM dataset represents a variety of in-car person behavior scenarios and is annotated with 2D/3D bounding boxes, segmentation masks and person activity labels. The dataset is available here https://vizta-tof.kl.dfki.de/. The publication describing the dataset in detail is available as a preprint here: https://arxiv.org/pdf/2103.11719.pdf
Abstract: Instance segmentation of planar regions in indoor scenes benefits visual SLAM and other applications such as augmented reality (AR) where scene understanding is required. Existing methods built upon two-stage frameworks show satisfactory accuracy but are limited by low frame rates. In this work, we propose a real-time deep neural architecture that estimates piece-wise planar regions from a single RGB image. Our model employs a variant of a fast single-stage CNN architecture to segment plane instances. Considering the particularity of the target detected, we propose Fast Feature Non-maximum Suppression (FF-NMS) to reduce the suppression errors resulted from overlapping bounding boxes of planes. We also utilize a Residual Feature Augmentation module in the Feature Pyramid Network (FPN) . Our method achieves significantly higher frame-rates and comparable segmentation accuracy against two-stage methods. We automatically label over 70,000 images as ground truth from the Stanford 2D-3D-Semantics dataset. Moreover, we incorporate our method with a state-of-the-art planar SLAM and validate its benefits.
We are happy to announce that two of our papers have been accepted and published in the IEEE Access journal. IEEE Access is an award-winning, multidisciplinary, all-electronic archival journal, continuously presenting the results of original research or development across all of IEEE’s fields of interest. The articles are published with open access to all readers. The research is part of the BIONIC project and was funded by the European Commission under the Horizon 2020 Programme Grant Agreement n. 826304.
“Simultaneous End User Calibration of Multiple Magnetic Inertial Measurement Units With Associated Uncertainty” Published in: IEEE Access (Volume: 9) Page(s): 26468 – 26483 Date of Publication: 05 February 2021 Electronic ISSN: 2169-3536 DOI: 10.1109/ACCESS.2021.3057579
“Magnetometer Robust Deep Human Pose Regression With Uncertainty Prediction Using Sparse Body Worn Magnetic Inertial Measurement Units” Published in: IEEE Access (Volume: 9) Page(s): 36657 – 36673 Date of Publication: 26 February 2021 Electronic ISSN: 2169-3536 DOI: 10.1109/ACCESS.2021.3062545
On March 4th, 2021, Dr. Jason Rambach gave a talk on Machine Learning and Computer Vision at the GIZ (Deutsche Gesellschaft für Internationale Zusammenarbeit) workshop on Machine Learning and Computer Vision for Earth Observation organized by the DFKI MLT department. In the talk, the foundations of Computer Vision, Machine Learning and Deep Learning as well as current Research and Implementation challenges were presented.
DFKI participates in the VIZTA project, coordinated by ST Micrelectronics, aiming at developing innovative technologies in the field of optical sensors and laser sources for short to long-range 3D-imaging and to demonstrate their value in several key applications including automotive, security, smart buildings, mobile robotics for smart cities, and industry4.0. The 18-month public summary of the project was released, including updates from DFKI Augmented Vision on time-of-flight camera dataset recording and deep learning algorithm development for car in-cabin monitoring and smart building person counting and anomaly detection applications.
Please click here to check out the complete summary.