Search
Publication Authors

Prof. Dr. Didier Stricker

Dr. Alain Pagani

Dr. Gerd Reis

Eric Thil

Keonna Cunningham

Dr. Oliver Wasenmüller
Dr. Muhammad Zeshan Afzal

Dr. Gabriele Bleser
Dr. Bruno Mirbach

Dr. Jason Raphael Rambach

Dr. Nadia Robertini

Dr. Bertram Taetz

Ahmed Aboukhadra

Sk Aziz Ali

Mhd Rashed Al Koutayni

Murad Almadani
Alaa Alshubbak

Yuriy Anisimov

Jilliam Maria Diaz Barros

Ramy Battrawy
Iuliia Brishtel
Hammad Butt

Mahdi Chamseddine

Steve Dias da Cruz

Fangwen Shu

Torben Fetzer

Ahmet Firintepe
Sophie Folawiyo

David Michael Fürst
Kamalveerkaur Garewal

Christiano Couto Gava

Tewodros Amberbir Habtegebrial
Simon Häring

Khurram Azeem Hashmi
Henri Hoyez

Jigyasa Singh Katrolia

Andreas Kölsch
Onorina Kovalenko

Stephan Krauß
Paul Lesur

Muhammad Jameel Nawaz Malik

Michael Lorenz

Markus Miezal

Mina Ameli

Nareg Minaskan Karabid

Mohammad Minouei

Pramod Murthy

Mathias Musahl

Peter Neigel

Manthan Pancholi
Mariia Podguzova

Praveen Nathan
Qinzhuan Qian
Rishav
Marcel Rogge
María Alejandra Sánchez Marín
Dr. Kripasindhu Sarkar

Alexander Schäfer

Pascal Schneider

René Schuster

Mohamed Selim

Tahira Shehzadi
Lukas Stefan Staecker

Yongzhi Su

Xiaoying Tan
Christian Witte

Yaxu Xie

Vemburaj Yadav

Dr. Vladislav Golyanik

Dr. Aditya Tewari

André Luiz Brandão
Publication Archive
New title
- ActivityPlus
- AlterEgo
- AR-Handbook
- ARVIDA
- Auroras
- AVILUSplus
- Be-greifen
- Body Analyzer
- CAPTURE
- COGNITO
- DAKARA
- Density
- DYNAMICS
- EASY-IMP
- Eyes Of Things
- iACT
- IMCVO
- IVMT
- LARA
- LiSA
- Marmorbild
- Micro-Dress
- Odysseus Studio
- On Eye
- OrcaM
- PAMAP
- PROWILAN
- ServiceFactory
- STREET3D
- SUDPLAN
- SwarmTrack
- TuBUs-Pro
- VIDETE
- VIDP
- VisIMon
- VISTRA
- You in 3D
Sub-word Image Clustering in Farsi Printed Books
Sub-word Image Clustering in Farsi Printed Books
Mohammad Reza Soheili, Ehsanollah Kabir, Didier Stricker
Proceedings of the 7th International Conference on Machine Vision (ICMV 2014) International Conference on Machine Vision (ICMV-07), 7th, November 19-21, Mailand, Italy
- Abstract:
- Most OCR systems are designed for the recognition of a single page. In case of unfamiliar font faces, low quality papers and degraded prints, the performance of these products drops sharply. However, an OCR system can use redundancy of word occurrences in large documents to improve recognition results. In this paper, we propose a sub-word image clustering method for the applications dealing with large printed documents. We assume that the whole document is printed by a unique unknown font with low quality print. Our proposed method finds clusters of equivalent sub-word images with an incremental algorithm. Due to the low print quality, we propose an image matching algorithm for measuring the distance between two sub-word images, based on Hamming distance and the ratio of the area to the perimeter of the connected components. We built a ground-truth dataset of more than 111000 sub-word images to evaluate our method. All of these images were extracted from an old Farsi book. We cluster all of these sub-words, including isolated letters and even punctuation marks. Then all centers of created clusters are labeled manually. We show that all sub-words of the book can be recognized with more than 99.7% accuracy by assigning the label of each cluster center to all of its members.