Article in IEEE Robotics and Automation Letter (RA-L) journal

We are happy to announce that our article “OPA-3D: Occlusion-Aware Pixel-Wise Aggregation for Monocular 3D Object Detection” was published in the prestigious IEEE Robotics and Automation Letters (RA-L) Journal. The work is a collaboration of DFKI with the TU Munich and Google. The article is openly accessible at:                                                                      

Abstract: Monocular 3D object detection has recently made a significant leap forward thanks to the use of pre-trained depth estimators for pseudo-LiDAR recovery. Yet, such two-stage methods typically suffer from overfitting and are incapable of explicitly encapsulating the geometric relation between depth and object bounding box. To overcome this limitation, we instead propose to jointly estimate dense scene depth with depth-bounding box residuals and object bounding boxes, allowing a two-stream detection of 3D objects that harnesses both geometry and context information. Thereby, the geometry stream combines visible depth and depth-bounding box residuals to recover the object bounding box via explicit occlusion-aware optimization. In addition, a bounding box based geometry projection scheme is employed in an effort to enhance distance perception. The second stream, named as the Context Stream, directly regresses 3D object location and size. This novel two-stream representation enables us to enforce cross-stream consistency terms, which aligns the outputs of both streams, and further improves the overall performance. Extensive experiments on the public benchmark demonstrate that OPA-3D outperforms state-of-the-art methods on the main Car category, whilst keeping a real-time inference speed.

Yongzhi Su, Yan Di, Guangyao Zhai, Fabian Manhardt, Jason Rambach, Benjamin Busam, Didier Stricker and Federico Tombari “OPA-3D: Occlusion-Aware Pixel-Wise Aggregation for Monocular 3D Object Detection.IEEE Robotics and Automation Letters (2023).

Contacts: Yongzhi Su, Dr. Jason Rambach

Radar Driving Activity Dataset (RaDA) Released

DFKI Augmented Vision recently released the first publicly available UWB Radar Driving Activity Dataset (RaDA), consisting of over 10k data samples from 10 different participants annotated with 6 driving activities. The dataset was recorded in the DFKI driving simulator environment. For more information and to download the dataset please check the project website:

The dataset release is accompanied by an article publication at the Sensors journal:

Brishtel, Iuliia, Stephan Krauss, Mahdi Chamseddine, Jason Raphael Rambach, and Didier Stricker. “Driving Activity Recognition Using UWB Radar and Deep Neural Networks.” Sensors 23, no. 2 (2023): 818.

Contacts: Dr. Jason Rambach, Iuliia Brishtel

Two new PhDs

On Thursday, October 27th, 2022, Mohamed Selim successfully defended his PhD thesis entitled “Deep Learning-based Head Orientation and Gender Estimation from Face Image” in front of the the PhD committee consisting of Prof. Dr. Didier Stricker (TU Kaiserslautern), Prof. Dr. Karsten Berns (TU Kaiserslautern), and Prof Dr. Stefan Deßloch (TU Kaiserslautern).

In the thesis, Mohamed Selim studied the problem of gender and head orientation estimation from face images. Machine-based perception can be of great benefit in extracting that underlying information in face images if the problem is properly modeled. In his thesis, novel solutions are provided to the problems of head orientation estimation and gender prediction. Moreover, the effect of facial appearance changes due to head orientation variation has been investigated on gender prediction accuracy. A novel orientation-guided feature maps recalibration method is presented, that significantly increased the accuracy of gender prediction.

Mohamed Selim received his bachelor and master’s degrees in Computer Science and Engineering from the German University in Cairo, Egypt. He joined the Augmented Vision department in October 2012, as a PhD candidate, and later in March 2018 as a researcher working on industrial and EU research projects. His research interests include computer vision, 3D reconstruction, and deep learning.

Mr. Selim after his successful PhD defense

A week later, on Friday, November 4th, 2022, MSc. Ing. Hammad Tanveer Butt also successfully defended his PhD thesis entitled “Improved Sensor Fusion and Deep Learning of 3D Human Pose From Sparse Magnetic Inertial Measurement Units” in front of the PhD committee consisting of Prof. Dr. Didier Stricker (TU Kaiserslautern and DFKI), Prof. Dr. Imran Shafi (National University of Sciences and Technology, Pakistan) and Prof. Dr. Jörg Dörr (TU Kaiserslautern and IESE Fraunhofer).

The goal of the thesis was to obtain a magnetometer robust 3D human body pose from sparse magnetic inertial motion sensors with uncertainty prediction employing Bayesian Deep learning. To this end, a systematic approach was adopted to address all the challenges of inertial motion capture in an end to end manner. First, simultaneous calibration of multiple magnetic inertial sensors was achieved with error mitigation and residual uncertainty learning. Then a magnetometer robust sensor fusion algorithm for 3D orientation was proposed. Adaptive anatomical error correction was used to reduce long term drift in the joint angles.

Also joint angle constraints were learned using a data driven approach while employing swing-twist formulation for 3D joint rotations. Finally, the thesis showed that Bayesian deep learning framework can be used to learn 3D human pose from sparse magnetic inertial sensors while also predicting the uncertainty of pose estimation which is well correlated with actual error and lack of information, particularly when the yaw angle derived from magnetometer is not used. The thesis led to two peer-reviewed contributions in IEEE Access Journal, as well as a best scientific paper award in IntelliSys-2019 Conference held at UK. The conference paper on swing-twist learning of joint constraints presented in Machine Vision Applications (MVA)-2019, Tokyo Japan was later invited by the reviewing committee amongst top-candidates to be published as a journal paper (extended version). A conference paper and a poster by the author were also accepted at FUSION-2019 Conference held at Ottawa, Canada.

MSc. Ing. Hammad Tanveer Butt received his Bachelors in Avionics (1999) and Master degree in Electrical Engineering (2013) from National University of Sciences and Technology (NUST) Pakistan, respectively. From 2016-2021, he worked at the Augmented Vision (AV) group DFKI as a researcher, while pursuing his PhD. His research interests include nano-electronics, MEMS sensors, deep learning/AI and quantum machine learning.

Start of the CORTEX² project

The kick-off meeting of the CORTEX² project has been held at DFKI in Kaiserslautern on September 20th, 2022.

Participants at the kick-off meeting in Kaiserslautern

The mission of CORTEX² “COoperative Real-Time EXperiences with EXtended reality” is to democratize access to the remote collaboration offered by next-generation XR experiences across a wide range of industries and SMEs.

CORTEX2 will provide:

  • Full support for AR experience as an extension of video conferencing systems when using heterogeneous service end devices through a novel Mediation Gateway platform.
  • Resource-efficient teleconferencing tools through innovative transmission methods and automatic summarization of shared long documents.
  • Easy-to-use and powerful XR experiences with instant 3D reconstruction of environments and objects, and simplified use of natural gestures in collaborative meetings.
  • Fusion of vision and audio for multichannel semantic interpretation and enhanced tools such as virtual conversational agents and automatic meeting summarization.
  • Full integration of internet of things (IoT) devices into XR experiences to optimize interaction with running systems and processes.
  • Optimal extension possibilities and broad adoption by delivering the core system with open APIs and launching open calls to enable further technical extensions, more comprehensive use cases, and deeper evaluation and assessment.

Partners of the project are:

  • DFKI – Deutsches Forschungszentrum für Künstliche Intelligenz GmbH Germany
  • LINAGORA – France
  • ALE – Alcatel-Lucent Entreprise International France
  • ICOM – Intracom SA Telecom Solutions Greece
  • AUS – AUSTRALO Alpha Lab MTÜ Estonia
  • F6S – F6S Network Limited Ireland
  • KUL– Katholieke Universiteit Leuven Belgium
  • CEA – Commissariat à l’énergie atomique et aux énergies alternatives France
  • ACT – Actimage GmbH Germany
  • UJI – Universitat Jaume I De Castellon

In addition to the project activities, CORTEX² will invest a total of 4 million Euros in two open calls, which will be aimed at recruiting tech startups/SMEs to co-develop CORTEX2; engaging new use-cases from different domains to demonstrate CORTEX2 replication through specific integration paths; assessing and validating the social impact associated with XR technology adoption in internal and external use cases.

Contact: Dr. Alain Pagani (Coordinator)

HAIKU project takes off!!

The European HAIKU project is taking off! The kick-off meeting took place in Lisbon on September 7th, 2022.

The goal of HAIKU is to develop a human-centric AI by exploring interactive AI prototypes in a variety of aviation contexts. A key challenge HAIKU faces is to develop human-centric digital assistants that will fit the way humans work.

It is essential both for safe operations, and for society in general, that the people who currently keep aviation so safe can work with, train and supervise these AI systems, and that future autonomous AI systems make judgements and decisions that would be acceptable to humans. HAIKU will pave the way for human-centric-AI by developing new AI-based ‘Digital Assistants’, and associated Human-AI Teaming practices, guidance and assurance processes, via the exploration of interactive AI prototypes in a wide range of aviation contexts.

Therefore, HAIKU will:

  • Design and develop a set of AI assistants, demonstrated in the different use cases.
  • Develop a comprehensive Human Factors design guidance and methods capability (‘HF4AI’) on how to develop safe, effective and trustworthy Digital Assistants for Aviation, integrating and expanding on existing state-of-the-art guidance.
  • Conduct controlled experiments with high operational relevance – illustrating the tasks, roles, autonomy and team performance of the Digital Assistant in a range of normal and emergency scenarios.
  • Develop new safety and validation assurance methods for Digital Assistants, to facilitate early integration into aviation systems by aviation stakeholders and regulatory authorities.
  • Deliver guidance on socially acceptable AI in safety critical operations, and for maintaining aviation’s strong safety record.

DFKI participates with two departments: Augmented Vision and Cognitive Assistants

Contact: Dr. Alain Pagani, Narek Minaskan

VIZTA Project successfully concluded after 42 months

The Augmented Vision department of DFKI participated in the VIZTA project, coordinated by ST Microelectronics, aiming at developing innovative technologies in the field of optical sensors and laser sources for short to long-range 3D-imaging and to demonstrate their value in several key applications including automotive, security, smart buildings, mobile robotics for smart cities, and industry4.0.

The final project review was successfully completed in Grenoble, France on November 17th-18th, 2022. The schedule included presentations on the achievements of all partners as well as live demonstrators of the developed technologies. DFKI presented their smart building person detection demonstrator based on a top-down view from a Time-of-flight (ToF) camera, developed in cooperation with the project partner IEE. A second demonstrator, showing an in-cabin monitoring system based on a wide-field-of-view, which is installed in DFKIs lab has been presented in a video.

During VIZTA, several key results were obtained at DFKI on the topics of in-car and smart building monitoring including:

Figure 1: In-car person and object detection (left), and top-down person detection and tracking for smart building applications (right).

Contact: Dr. Jason Rambach, Dr. Bruno Mirbach

DFKI Augmented Vision Researchers win two awards in Object Pose Estimation challenge (BOP Challenge, ECCV 2022)

DFKI Augmented Vision researchers Yongzhi Su, Praveen Nathan and Jason Rambach received their 1st place award in the prestigious BOP Challenge 2022 in the categories Overall Best Segmentation Method and The Best BlenderProc-Trained Segmentation Method.

The BOP benchmark and challenge addresses the problem of 6-degree-of-freedom object pose estimation, which is of great importance for many applications such as robot grasping or augmented reality. This year, the BOP challenge was held within the “Recovering 6D Object Pose” Workshop at the European Conference on Computer Vision (ECCV) in Tel Aviv, Israel . A total award of $4000 was distributed among the winning teams of the BOP challenge, donated by Meta Reality Labs and Niantic.

The awards were received by Dr. Jason Rambach on behalf of the DFKI Team and a short presentation of the method followed. The winning method was based on the CVPR 2022 paper “ZebraPose”  

ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose Estimation
Yongzhi Su, Mahdi Saleh, Torben Fetzer, Jason Raphael Rambach, Nassir Navab, Benjamin Busam, Didier Stricker, Federico Tombari

The winning approach was develop by a team led by DFKI AV, with contributing researchers from TU Munich and Zhejiang University.

Contact: Yongzhi Su, Dr. Jason Rambach

Dr. Jason Rambach with the award
Kick-Off for EU Project “HumanTech”

Our Augmented Vision department is the coordinator of the new large European project “HumanTech”. The Kick-Off meeting was held on July 20th, 2022, at DFKI in Kaiserslautern. Please read the whole article here: Artificial intelligence for a safe and sustainable construction industry (

ICPR 2022: Best Paper Award

We are proud to have received a Best Paper Award at this year’s ICPR. Please take a look at this article for more information: Machine Learning with Synthetic Data – Research Unit Augmented Vision receives Best Paper Award at ICPR 2022 (

HCII 2022: Two papers accepted

We are pleased to announce that the Augmented Vision group presented two papers at the HCI International 2022 Conference from June 28th to July 1st, 2022.

The two accepted papers are:

Title: Learning Effect of Lay people in Gesture-Based Locomotion in Virtual Reality

Authors: Alexander Schäfer, Gerd Reis, Didier Stricker

Abstract: Locomotion in Virtual Reality (VR) is an important part of VR applications. Many scientists are enriching the community with different variations that enable locomotion in VR. Some of the most promising methods are gesture-based and do not require additional handheld hardware. Recent work focused mostly on user preference and performance of the different locomotion techniques. This ignores the learning effect that users go through while new methods are being explored. In this work, it is investigated whether and how quickly users can adapt to a hand gesture-based locomotion system in VR. Four different locomotion techniques are implemented and tested by participants. The goal of this paper is twofold: First, it aims to encourage researchers to consider the learning effect in their studies. Second, this study aims to provide insight into the learning effect of users in gesture-based systems.

Title: Human intelligent machine teaming in single pilot operation: A case study

Authors: Nareg Minaskan Karabid, Charles-Alban Dormoy, Alain Pagani, Jean-Marc Andre, Didier Stricker

Abstract: With recent advances in artificial intelligence (AI) and learning based systems, industries have started to integrate AI components into their products and workflows. In areas where frequent testing and development is possible these systems have proved to be quite useful such as in automotive industry where vehicle are now equipped with advanced driver-assistant systems (ADAS) capable of self-driving, route planning, and maintaining safe distances from lanes and other vehicles. However, as the safety-critical aspect of task increases, more difficult and expensive it is to develop and test AI-based solutions. Such is the case in aviation and therefore, development must happen over longer periods of time and in a step-by-step manner. This paper focuses on creating an interface between the human pilot and a potential assistant system that helps the pilot navigate through a complex flight scenario. Verbal communication and augmented reality (AR) were chosen as means of communication and the verbal communication was carried out in a wizard-of-Oz (WoOz) fashion. The interface was tested in a flight simulator and its usefulness was evaluated by NASA-TLX and SART questionnaires for workload and situation awareness.