Please check out the article “Artificial intelligence for a safe and sustainable construction industry (” concerning the new EU project HumanTech which is coordinated by Dr. Jason Rambach, head of the Spatial Sensing and Machine Perception team (Augmented Reality/Augmented Vision department, Prof. Didier Stricker) at the German Research Center for Artificial Intelligence (DFKI) in Kaiserslautern.

Augmented Vision @CVPR 2022

DFKI Augmented Vision had a strong presence in the recent CVPR 2022 Conference held on June 19th-23rd, 2022, in New Orleans, USA. The IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR) is the premier annual computer vision event internationally. Homepage: .

Overall, three publications were presented:

1. ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose Estimation
Yongzhi Su, Mahdi Saleh, Torben Fetzer, Jason Raphael Rambach, Nassir Navab, Benjamin Busam, Didier Stricker, Federico Tombari

2. SOMSI: Spherical Novel View Synthesis with Soft Occlusion Multi-Sphere Images Tewodros A Habtegebrial, Christiano Gava, Marcel Rogge, Didier Stricker, Varun Jampani

3. Unsupervised Anomaly Detection from Time-of-Flight Depth Images
Pascal Schneider, Jason Rambach, Bruno Mirbach , Didier Stricker

Keynote Presentation by Dr. Jason Rambach in Computer Vision session of the Franco-German Research and Innovation Network event

On June 14th, 2022, Dr. Jason Rambach gave a keynote talk in the Computer Vision session of the  Franco-German Research and Innovation Network event held at the Inria headquarters in Versailles, Paris, France. In the talk, an overview of the current activities of the Spatial Sensing and Machine Perception team at DFKI Augmented Vision was presented.

René Schuster successfully finishes his PhD
René Schuster and Prof. Dr. Didier Stricker moments after the oral defense.

On March 18th, 2022, René Schuster successfully defended his dissertation entitled “Data-driven and Sparse-to-Dense Concepts in Scene Flow Estimation for Automotive Applications”. The reviewers were Prof. Dr. Didier Stricker (Technical University of Kaiserslautern) and Prof. Dr. Andrés Bruhn (University of Stuttgart). Mr. Schuster received his doctorate from the Department of Computer Science at the Technical University of Kaiserslautern.

In his thesis, Mr. Schuster worked on three-dimensional motion estimation of the dynamic environment of vehicles. The focus was on machine learning methods, and the interpolation of individual estimates into a dense motion field. A particular challenge was the scarcity of annotated data for this problem and use case.

René Schuster received an M. Sc. in computational engineering from Darmstadt University of Technology in 2017. He then moved to DFKI to join the augmented reality group of Prof. Stricker. Much of his research was done in collaborative projects with BMW.

René Schuster at the celebration of his newly earned title.
CVPR 2022: Two papers accepted

We are happy to announce that the Augmented Vision group will present two papers in the upcoming CVPR 2022 Conference from June 19th-23rd in New Orleans, USA. The IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR) is the premier annual computer vision event internationally. Homepage:

The two accepted papers are:

Summary: ZebraPose sets a new paradigm on model-based 6DoF object pose estimation by using a binary object surface encoding to train a neural network to predict the locations of model vertices in a coarse to fine manner. ZebraPose shows a major improvement over the state-of-the-art on several datasets of the BOP Object Pose Estimation benchmark.


Contact: Yongzhi Su, Dr. Jason Rambach

Summary: We propose a novel Multi-Sphere Image representation called Soft Occlusion MSI (SOMSI) and efficient rendering technique that produces accurate spherical novel-views from a sparse spherical light-field. SOMSI models appearance features in a smaller set (e.g. 3) of occlusion levels instead of larger number (e.g. 64) of MSI spheres. Experiments on both synthetic and real-world spherical light-fields demonstrate that using SOMSI can provide a good balance between accuracy and run-time. SOMSI view synthesis quality is on-par with state-of-the-art models like NeRF, while being 2 orders of magnitude faster.

For more information, please visit the project page at

Contact: Tewodros A Habtegebrial

One of our projects has been awarded with the Nvidia Academic Hardware Grant

We are happy to announce that our project DECODE has been accepted for the Nvidia Academic Hardware Grant. Nvidia will support our research in the field of human motion estimation and semantic reconstruction by donating a Nvidia A100 GPU for data centers. We will use the new hardware to accelerate our experiments for continual learning.

2 Papers accepted at BMVC 2021 Conference

We are happy to announce that the Augmented Vision group will present 2 papers in the upcoming BMVC 2021 Conference, 22-25 November, 2021:

The British Machine Vision Conference (BMVC) is the British Machine Vision Association (BMVA) annual conference on machine vision, image processing, and pattern recognition. It is one of the major international conferences on computer vision and related areas held in the UK. With increasing popularity and quality, it has established itself as a prestigious event on the vision calendar. Homepage:  

The 2 accepted papers are:

1.  TICaM: A Time-of-flight In-car Cabin Monitoring Dataset
Authors: Jigyasa Singh Katrolia, Ahmed Elsherif, Hartmut Feld, Bruno Mirbach, Jason Raphael Rambach, Didier Stricker

Summary: TICaM is a Time-of-flight In-car Cabin Monitoring dataset for vehicle interior monitoring using a single wide-angle depth camera. The dataset goes beyond currently available in-car cabin datasets in terms of the ambit of labeled classes, recorded scenarios and annotations provided;  all at the same time. The dataset is available here:



Contact: Jason Rambach

2. PlaneRecNet: Multi-Task Learning with Cross-Task Consistency for Piece-Wise Plane Detection and Reconstruction from a Single RGB Image
Authors: Yaxu Xie, Fangwen Shu, Jason Raphael Rambach, Alain Pagani, Didier Stricker

Summary: Piece-wise 3D planar reconstruction provides holistic scene understanding of man-made environments, especially for indoor scenarios. Different from other existing approaches, we start from enforcing cross-task consistency for our multi-task convolutional neural network, PlaneRecNet, which integrates a single-stage instance segmentation network for piece-wise planar segmentation and a depth decoder to reconstruct the scene from a single RGB image.


Contact: Alain Pagani

Paper accepted at the CSCS 2021!

We are happy to announce that our paper “Multi-scale Iterative Residuals for Fast and Scalable Stereo Matching” has been accepted to the CSCS 2021!

The Computer Science in Cars Symposium (CSCS) is ACM’s flagship event in the field of Car IT. The goal is to bring together scientists, engineers, business representatives, and anyone who shares a passion for solving the myriad of complex problems in vehicle technology and their application to automation, driver and vehicle safety, and driving system safety.

In our work, we place stereo matching in a coarse-to-fine estimation framework to improve runtime and memory requirements while maintaining accuracy. This multiscale framework is tested for two state-of-the-art stereo networks and shows significant improvements in runtime, computational complexity, and memory requirements.

Link to preprint:

Title: Multi-scale Iterative Residuals for Fast and Scalable Stereo Matching

Authors: Kumail Raza, René Schuster, Didier Stricker

Start des Projektes „DECODE”

KI zur Erkennung menschlicher Bewegungen und des Umfeldes

Adaptive Methoden die kontinuierlich dazu lernen (Lebenslanges Lernen), bilden eine zentrale Herausforderung zur Entwicklung von robusten, realitätsnahen KI-Anwendungen. Neben der reichen Historie auf dem Gebiet des allgemeinen kontinuierlichen Lernens („Continual Learning“) hat auch das Themenfeld von kontinuierlichem Lernen für Machinelles Sehen unter Realbedingungen jüngst an Interesse gewonnen.

Ziel des Projektes DECODE ist die Erforschung von kontinuierlich adaptierfähigen Modellen zur Rekonstruktion und dem Verständnis von menschlicher Bewegung und des Umfeldes in anwendungsbezogenen Umgebungen. Dazu sollen mobile, visuelle und inertiale Sensoren (Beschleunigungs- und Drehratensensoren) verwendet werden. Für diese verschiedenen Typen an Sensoren und Daten sollen unterschiedliche Ansätze aus dem Bereich des Continual Learnings erforscht und entwickelt werden um einen problemlosen Transfer von Laborbedingungen zu alltäglichen, realistischen Szenarien zu gewährleisten. Dabei konzentrieren sich die Arbeiten auf die Verbesserung in den Bereichen der semantischen Segmentierung von Bildern und Videos, der Schätzung von Kinematik und Pose des menschlichen Körpers sowie der Repräsentation von Bewegungen und deren Kontext. Das Feld potentieller Anwendungsgebiete für die in DECODE entwickelten Methoden ist weitreichend und umfasst eine detaillierte ergonomische Analyse von Mensch-Maschine Interaktionen zum Beispiel am Arbeitsplatz, in Fabriken, oder in Fahrzeugen.

Weitere Informationen:

Contact: René Schuster

DFKI AV – Stellantis Collaboration on Radar-Camera Fusion – 2 publications

DFKI Augmented Vision is working with Stellantis on the topic of Radar-Camera Fusion for Automotive Object Detection using Deep Learning since 2020. The collaboration has already led to two publications, in ICCV 2021 (International Conference on Computer Vision – ERCVAD Workshop on “Embedded and Real-World Computer Vision in Autonomous Driving”) and WACV 2022 (Winter Conference on Applications of Computer Vision).

The 2 publications are:

1.  Deployment of Deep Neural Networks for Object Detection on Edge AI Devices with Runtime OptimizationProceedings of the IEEE International Conference on Computer Vision Workshops – ERCVAD Workshop on Embedded and Real-World Computer Vision in Autonomous Driving

Lukas Stefan Stäcker, Juncong Fei, Philipp Heidenreich, Frank Bonarens, Jason Rambach, Didier Stricker, Christoph Stiller

This paper discusses the optimization of neural network based algorithms for object detection based on camera, radar, or lidar data in order to deploy them on an embedded system on a vehicle.

2. Fusion Point Pruning for Optimized 2D Object Detection with Radar-Camera FusionProceedings of the IEEE Winter Conference on Applications of Computer Vision, 2022

Lukas Stefan Stäcker, Juncong Fei, Philipp Heidenreich, Frank Bonarens, Jason Rambach, Didier Stricker, Christoph Stiller

This paper introduces fusion point pruning, a new method to optimize the selection of fusion points within the deep learning network architecture for radar-camera fusion.

Please view the abstract here: Fusion Point Pruning for Optimized 2D Object Detection with Radar-Camera Fusion (

Contact: Dr. Jason Rambach