DFKI participates in the VIZTA project, coordinated by ST Micrelectronics, aiming at developing innovative technologies in the field of optical sensors and laser sources for short to long-range 3D-imaging and to demonstrate their value in several key applications including automotive, security, smart buildings, mobile robotics for smart cities, and industry4.0. The 18-month public summary of the project was released, including updates from DFKI Augmented Vision on time-of-flight camera dataset recording and deep learning algorithm development for car in-cabin monitoring and smart building person counting and anomaly detection applications.
Please click here to check out the complete summary.
We are excited to announce that the Augmented Vision group will present 3 papers in the upcoming VISAPP 2021 Conference, February 8th-10th, 2021:
Conference on Computer Vision Theory and Applications (VISAPP) is part of
VISIGRAPP, the 16th International Joint Conference on Computer Vision, Imaging
and Computer Graphics Theory and Applications. VISAPP aims at becoming a major
point of contact between researchers, engineers and practitioners on the area
of computer vision application systems. Homepage: http://www.visapp.visigrapp.org/
We are happy to announce
that our paper “SynPo-Net–Accurate and Fast
CNN-Based 6DoF Object Pose Estimation Using Synthetic Training” has been
accepted for publication at the MDPI Sensors journal, Special Issue Object
Tracking and Motion Analysis. Sensors (ISSN 1424-8220; CODEN: SENSC9)
is the leading international peer-reviewed open access journal on the science and technology of sensors.
Abstract: Estimation and
tracking of 6DoF poses of objects in images is a challenging problem of great
importance for robotic interaction and augmented reality. Recent approaches
applying deep neural networks for pose estimation have shown encouraging
results. However, most of them rely on training with real images of objects
with severe limitations concerning ground truth pose acquisition, full coverage
of possible poses, and training dataset scaling and generalization capability.
This paper presents a novel approach using a Convolutional Neural Network (CNN)
trained exclusively on single-channel Synthetic images of objects to regress
6DoF object Poses directly (SynPo-Net). The proposed SynPo-Net is a network
architecture specifically designed for pose regression and a proposed domain
adaptation scheme transforming real and synthetic images into an intermediate
domain that is better fit for establishing correspondences. The extensive
evaluation shows that our approach significantly outperforms the
state-of-the-art using synthetic training in terms of both accuracy and speed.
Our system can be used to estimate the 6DoF pose from a single frame, or be
integrated into a tracking system to provide the initial pose.
After two years of collaborative work, the project ArInfuse is inviting for its final workshop on January 28th.
ARinfuse is an Erasmus+ project that aims to infuse skills in Augmented Reality for geospatial information management in the context of utility underground infrastructures, such as water, sewage, electricity, gas and fiber optics. In this field, there is a real need for an accurate positioning of the underground utilities, to avoid damages to the existing infrastructures. Information communication technologies (ICT), in fusion with global navigation satellite systems (GNSS), GIS and geodatabases and augmented/virtual reality (AR/VR) are able to offer the possibility to convert the geospatial information of the underground utilities into a powerful tool for field workers, engineers and managers. ARinfuse is mainly addressed to technical professional profiles (future and current) in the utility sector that use, or are planning to use AR technology into practical applications of ordinary management and maintenance of utility networks.
The workshop entitled “Exploiting the potential of Augmented Reality & Geospatial Technologies within the utilities sector” is addressed to engineering students and professionals that are interested in the function, appliance and benefits of AR and geospatial technologies in the utilities sector.
The workshop will also introduce the ARinfuse catalogue of training modules on Augmented Reality and Geoinformatics applied within the utility infrastructure sector.
We are proud to announce that the Augmented Vision group will present three papers in the upcoming ICPR 2020 conference which will take place from January 10th till 15th, 2021. The International Conference on Pattern Recognition (ICPR) is the premier world conference in Pattern Recognition. It covers both theoretical issues and applications of the discipline. The 25th event in this series is organized as an online virtual conference with more than 1800 participants expected.
The Winter Conference on Applications of Computer Vision (WACV 2021) is IEEE’s and the PAMI-TC’s premier meeting on applications of computer vision. With its high quality and low cost, it provides an exceptional value for students, academics and industry researchers. In 2021, the conference is organized as a virtual online event from January 5th till 9th, 2021.
Abstract: This paper demonstrates a system capable of combining a sparse, indirect, monocular visual SLAM, with both offline and real-time Multi-View Stereo (MVS) reconstruction algorithms. This combination overcomes many obstacles encountered by autonomous vehicles or robots employed in agricultural environments, such as overly repetitive patterns, need for very detailed reconstructions, and abrupt movements caused by uneven roads. Furthermore, the use of a monocular SLAM makes our system much easier to integrate with an existing device, as we do not rely on a LiDAR (which is expensive and power consuming), or stereo camera (whose calibration is sensitive to external perturbation e.g. camera being displaced). To the best of our knowledge, this paper presents the first evaluation results for monocular SLAM, and our work further explores unsupervised depth estimation on this specific application scenario by simulating RGB-D SLAM to tackle the scale ambiguity, and shows our approach produces econstructions that are helpful to various agricultural tasks. Moreover, we highlight that our experiments provide meaningful insight to improve monocular SLAM systems under agricultural settings.
Abstract: Images recorded during the lifetime of computer vision based systems undergo a wide range of illumination and environmental conditions affecting the reliability of previously trained machine learning models. Image normalization is hence a valuable preprocessing component to enhance the models’ robustness. To this end, we introduce a new strategy for the cost function formulation of encoder-decoder networks to average out all the unimportant information in the input images (e.g. environmental features and illumination changes) to focus on the reconstruction of the salient features (e.g. class instances). Our method exploits the availability of identical sceneries under different illumination and environmental conditions for which we formulate a partially impossible reconstruction target: the input image will not convey enough information to reconstruct the target in its entirety. Its applicability is assessed on three publicly available datasets. We combine the triplet loss as a regularizer in the latent space representation and a nearest neighbour search to improve the generalization to unseen illuminations and class instances. The importance of the aforementioned post-processing is highlighted on an automotive application. To this end, we release a synthetic dataset of sceneries from three different passenger compartments where each scenery is rendered under ten different illumination and environmental conditions: https://sviro.kl.dfki.de
Jameel Malik successfully defended his PhD thesis entitled “Deep Learning-based 3D Hand Pose and Shape Estimation from a Single Depth Image: Methods, Datasets and Application” in the presence of the PhD committee made up of Prof. Dr. Didier Stricker (Technische Universitat Kaiserslautern), Prof. Dr. Karsten Berns (Technische Universitat Kaiserslautern), Prof. Dr. Antonis Argyros (University of Crete) and Prof. Dr. Sebastian Michel (Technische Universitat Kaiserslautern) on Wednesday, November 11th, 2020.
In his thesis, Jameel Malik addressed the unique challenges of 3D hand pose and shape estimation, and proposed several deep learning based methods that achieve the state-of-the-art accuracy on public benchmarks. His work focuses on developing an effective interlink between the hand pose
and shape using deep neural networks. This interlink allows to improve the
accuracy of both estimates. His recent paper on 3D convolution based hand pose and shape estimation network was accepted at the premier
conference IEEE/CVF CVPR 2020.
Jameel Malik recieved his bachelors and master degrees in electrical engineering from University of Engineering and Technology (UET) and National University of Sciences and Technology (NUST) Pakistan, respectively. Since 2017, he has been working at the Augmented Vision (AV) group DFKI as a researcher. His research interests include computer vision and deep learning.
A week later, on Thurday, November 19th, 2020, Mr. Markus Miezal also successfully defended his PhD thesis entitled “Models, methods and error source investigation for real-time Kalman filter based inertial human body tracking” in front of the PhD committee consisting of Prof. Dr. Didier Stricker (TU Kaiserslautern and DFKI), Prof. Dr. Björn Eskofier (FAU Erlangen) and Prof. Dr. Karsten Berns (TU Kaiserslautern).
The goal of the thesis is to work towards a robust human body tracking system based on inertial sensors. In particular the identification and impact of different error sources on tracking quality are investigated. Finally, the thesis proposes a real-time,
magnetometer-free approach for tracking the lower body with ground contact and
translation information. Among the first author publications of the
contributions, one can find a journal article in MDPI Sensors and a conference
paper on the ICRA 2017.
In 2010, Markus Miezal received his diploma in computer science from the University of Bremen, Germany and started working at the Augmented Vision group at DFKI on visual-inertial sensor fusion and body tracking. In 2015, he followed Dr. Gabriele Bleser into the newly founded interdisciplinary research group wearHEALTH at the TU Kaiserslautern, where the research on body tracking continued, focussing on health related applications such as gait analysis. While finishing his PhD thesis, he co-founded the company sci-track GmbH as spin-off from TU KL and DFKI GmbH, which aims to transfer robust inertial human body tracking algorithms as middleware to industry partners. In the future Markus will continue research at university and support the company.
The Project ENNOS integrates color and depth cameras with the capabilities of deep neural networks on a compact FPGA-based platform to create a flexible and powerful optical system with a wide range of applications in production contexts. While FPGAs offer the flexibility to adapt the system to different tasks, they also constrain the size and complexity of the neural networks. The challenge is to transform the large and complex structure of modern neural networks into a small and compact FPGA architecture. To showcase the capabilities of the ENNOS concept three scenarios have been selected. The first scenario covers the automatic anonymization of people during remote diagnosis, the second one addresses semantic 3D scene segmentation for robotic applications and the third one features an assistance system for model identification and stocktaking in large facilities.
During the milestone review a prototype of the ENNOS camera could be presented. It integrates color and depth camera as well as an FPGA for the execution of neural networks in the device. Furthermore, solutions for the three scenarios could be demonstrated successfully with one prototype already running entirely on the ENNOS platform. This demonstrates that the project is on track to achieve its goals and validates the fundamental approach and concept of the project.
Project Partners: Robert Bosch GmbH Deutsches Forschungszentrum für Künstliche Intelligenz GmbH (DFKI) KSB SE & Co. KGaA ioxp GmbH ifm eletronic GmbH* PMD Technologies AG*
We are happy to announce that our paper “TGA: Two-level Group Attention for Assembly State Detection” has been accepted for publication at the IEEE International Symposium on Mixed and Augmented Reality (ISMAR), which will take place online from November 9th to 13th. The IEEE ISMAR is the leading international academic conference in the fields of Augmented Reality and Mixed Reality. The symposium is organized and supported by the IEEE Computer Society, IEEE VGTC and ACM SIGGRAPH.
Abstract: Assembly state detection, i.e., object state detection, has a critical meaning in computer vision tasks, especially in AR assisted assembly. Unlike other object detection problems, the visual difference between different object states can be subtle. For the better learning of such subtle appearance difference, we proposed a two-level group attention module (TGA), which consists of inter-group attention and intro-group attention. The relationship between feature groups as well as the representation within a feature group is simultaneously enhanced. We embedded the proposed TGA module in a popular object detector and evaluated it on two new datasets related to object state estimation. The result shows that our proposed attention module outperforms the baseline attention module.
PTC has acquired ioxp GmbH, a German industrial start-up for cognitive AR and AI software. ioxp is a spin-off from the Augmented Vision Department of the German Research Center for Artificial Intelligence GmbH (DFKI). For more Information click here or here (both articles in German only).