Search
Publication Authors

Prof. Dr. Didier Stricker

Dr. Alain Pagani

Dr. Gerd Reis

Eric Thil

Keonna Cunningham

Dr. Oliver Wasenmüller

Dr. Muhammad Zeshan Afzal

Dr. Gabriele Bleser

Dr. Muhammad Jameel Nawaz Malik
Dr. Bruno Mirbach

Dr. Jason Raphael Rambach

Dr. Nadia Robertini

Dr. René Schuster

Dr. Bertram Taetz

Ahmed Aboukhadra

Sk Aziz Ali

Mhd Rashed Al Koutayni

Yuriy Anisimov

Jilliam Maria Diaz Barros

Ramy Battrawy
Hammad Butt

Mahdi Chamseddine

Steve Dias da Cruz
Fangwen Shu

Torben Fetzer

Ahmet Firintepe
Sophie Folawiyo

David Michael Fürst

Christiano Couto Gava

Tewodros Amberbir Habtegebrial
Simon Häring

Khurram Azeem Hashmi
Henri Hoyez

Jigyasa Singh Katrolia

Andreas Kölsch
Onorina Kovalenko

Stephan Krauß
Paul Lesur

Michael Lorenz

Dr. Markus Miezal

Mina Ameli

Nareg Minaskan Karabid

Mohammad Minouei

Pramod Murthy

Mathias Musahl

Peter Neigel

Manthan Pancholi
Mariia Podguzova

Praveen Nathan
Qinzhuan Qian
Rishav
Marcel Rogge
María Alejandra Sánchez Marín
Dr. Kripasindhu Sarkar

Alexander Schäfer

Pascal Schneider

Mohamed Selim

Tahira Shehzadi
Lukas Stefan Staecker

Yongzhi Su

Xiaoying Tan
Christian Witte

Yaxu Xie

Vemburaj Yadav

Dr. Vladislav Golyanik

Dr. Aditya Tewari

André Luiz Brandão
Publication Archive
New title
- ActivityPlus
- AlterEgo
- AR-Handbook
- ARVIDA
- Auroras
- AVILUSplus
- Be-greifen
- Body Analyzer
- CAPTURE
- COGNITO
- DAKARA
- Density
- DYNAMICS
- EASY-IMP
- Eyes Of Things
- iACT
- IMCVO
- IVMT
- LARA
- LiSA
- Marmorbild
- Micro-Dress
- Odysseus Studio
- On Eye
- OrcaM
- PAMAP
- PROWILAN
- ServiceFactory
- STREET3D
- SUDPLAN
- SwarmTrack
- TuBUs-Pro
- VIDETE
- VIDP
- VisIMon
- VISTRA
- You in 3D
Clustering of Farsi sub-word images for whole-book recognition
Clustering of Farsi sub-word images for whole-book recognition
Ehsanollah Kabir, Didier Stricker, Mohammad Reza Soheili
SPIE Conference on Document Recognition and Retrieval (DRR), February 11-12, San Francisco, CA, USA
- Abstract:
- Redundancy of word and sub-word occurrences in large documents can be effectively utilized in an OCR system to improve recognition results. Most OCR systems employ language modeling techniques as a post-processing step; however these techniques do not use important pictorial information that exist in the text image. In case of large-scale recognition of degraded documents, this information is even more valuable. In our previous work, we proposed a subword image clustering method for the applications dealing with large printed documents. In our clustering method, the ideal case is when all equivalent sub-word images lie in one cluster. To overcome the issues of low print quality, the clustering method uses an image matching algorithm for measuring the distance between two sub-word images. The measured distance with a set of simple shape features were used to cluster all sub-word images. In this paper, we analyze the effects of adding more shape features on processing time, purity of clustering, and the final recognition rate. Previously published experiments have shown the efficiency of our method on a book. Here we present extended experimental results and evaluate our method on another book with totally different font face. Also we show that the number of the new created clusters in a page can be used as a criteria for assessing the quality of print and evaluating preprocessing phases.
- Keywords:
- document image analysis, sub-word image, incremental clustering, shape matching, large document, Persian