Search
Publication Authors

Prof. Dr. Didier Stricker

Dr. Alain Pagani

Dr. Gerd Reis

Eric Thil

Keonna Cunningham

Dr. Oliver Wasenmüller

Dr. Muhammad Zeshan Afzal

Dr. Gabriele Bleser

Dr. Muhammad Jameel Nawaz Malik
Dr. Bruno Mirbach

Dr. Jason Raphael Rambach

Dr. Nadia Robertini

Dr. René Schuster

Dr. Bertram Taetz

Ahmed Aboukhadra

Sk Aziz Ali

Mhd Rashed Al Koutayni

Yuriy Anisimov

Jilliam Maria Diaz Barros

Ramy Battrawy
Hammad Butt

Mahdi Chamseddine

Steve Dias da Cruz
Fangwen Shu

Torben Fetzer

Ahmet Firintepe
Sophie Folawiyo

David Michael Fürst
Anshu Garg

Christiano Couto Gava
Suresh Guttikonda

Tewodros Amberbir Habtegebrial
Simon Häring

Khurram Azeem Hashmi
Henri Hoyez
Alireza Javanmardi

Jigyasa Singh Katrolia

Andreas Kölsch
Ganesh Shrinivas Koparde
Onorina Kovalenko

Stephan Krauß
Paul Lesur

Michael Lorenz

Dr. Markus Miezal

Mina Ameli

Nareg Minaskan Karabid

Mohammad Minouei

Pramod Murthy

Mathias Musahl

Peter Neigel

Manthan Pancholi
Mariia Podguzova

Praveen Nathan
Qinzhuan Qian
Rishav
Marcel Rogge
María Alejandra Sánchez Marín
Dr. Kripasindhu Sarkar

Alexander Schäfer

Pascal Schneider

Mohamed Selim

Tahira Shehzadi
Lukas Stefan Staecker

Yongzhi Su

Xiaoying Tan
Christian Witte

Yaxu Xie

Vemburaj Yadav

Dr. Vladislav Golyanik

Dr. Aditya Tewari

André Luiz Brandão
Publication Archive
New title
- ActivityPlus
- AlterEgo
- AR-Handbook
- ARVIDA
- Auroras
- AVILUSplus
- Be-greifen
- Body Analyzer
- CAPTURE
- COGNITO
- DAKARA
- Density
- DYNAMICS
- EASY-IMP
- Eyes Of Things
- iACT
- IMCVO
- IVMT
- LARA
- LiSA
- Marmorbild
- Micro-Dress
- Odysseus Studio
- On Eye
- OrcaM
- PAMAP
- PROWILAN
- ServiceFactory
- STREET3D
- SUDPLAN
- SwarmTrack
- TuBUs-Pro
- VIDETE
- VIDP
- VisIMon
- VISTRA
- You in 3D
EmmDocClassifier: Efficient Multimodal Document Image Classifier for Scarce Data
EmmDocClassifier: Efficient Multimodal Document Image Classifier for Scarce Data
Shrinidhi Kanchi, Alain Pagani, Hamam Mokayed, Marcus Liwicki, Didier Stricker, Muhammad Zeshan Afzal
Applied Sciences (MDPI) 12 3 Seiten 1-18 MDPI 1/2022 .
- Abstract:
- Document classification is one of the most critical steps in the document analysis pipeline. There are two types of approaches for document classification, known as image-based and multimodal approaches. Image-based document classification approaches are solely based on the inherent visual cues of the document images. In contrast, the multimodal approach co-learns the visual and textual features, and it has proved to be more effective. Nonetheless, these approaches require a huge amount of data. This paper presents a novel approach for document classification that works with a small amount of data and outperforms other approaches. The proposed approach incorporates a hierarchical attention network (HAN) for the textual stream and the EfficientNet-B0 for the image stream. The hierarchical attention network in the textual stream uses dynamic word embedding through fine-tuned BERT. HAN incorporates both the word level and sentence level features. While earlier approaches rely on training on a large corpus (RVL-CDIP), we show that our approach works with a small amount of data (Tobacco-3482). To this end, we trained the neural network at Tobacco3482 from scratch. Therefore, we outperform the state-of-the-art by obtaining an accuracy of 90.3%. This results in a relative error reduction rate of 7.9%