A comparison of 1D and 2D LSTM architectures for the recognition of handwritten Arabic

A comparison of 1D and 2D LSTM architectures for the recognition of handwritten Arabic
Mohammad Reza Yousefi, Mohammad Reza Soheili, Thomas Breuel, Didier Stricker
SPIE Conference on Document Recognition and Retrieval (DRR-22), February 11-12, San Francisco, San Francisco, USA

Abstract:
In this paper, we present an Arabic handwriting recognition method based on recurrent neural network. We use the Long Short Term Memory (LSTM) architecture, that have proven successful in different printed and handwritten OCR tasks. Applications of LSTM for handwriting recognition employ the two-dimensional architecture to deal with the variations in both vertical and horizontal axis. However, we show that using a simple pre-processing step that normalizes the position and baseline of letters, we can make use of 1D LSTM, which is faster in learning and convergence, and yet achieve superior performance. In a series of experiments on IFN/ENIT database for Arabic handwriting recognition, we demonstrate that our proposed pipeline can outperform 2D LSTM networks. Furthermore, we provide comparisons with 1D LSTM networks trained with manually crafted features to show that the automatically learned features in a globally trained 1D LSTM network with our normalization step can even outperform such systems.

A comparison of 1D and 2D LSTM architectures for the recognition of handwritten Arabic

A comparison of 1D and 2D LSTM architectures for the recognition of handwritten Arabic
(Hrsg.)
Proc. of SPIE-IS&T Electronic Imaging SPIE Conference on Document Recognition and Retrieval (DRR-22), February 11-12, San Francisco, San Francisco, USA

Abstract:
In this paper, we present an Arabic handwriting recognition method based on recurrent neural network. We use the Long Short Term Memory (LSTM) architecture, that have proven successful in different printed and handwritten OCR tasks. Applications of LSTM for handwriting recognition employ the two-dimensional architecture to deal with the variations in both vertical and horizontal axis. However, we show that using a simple pre-processing step that normalizes the position and baseline of letters, we can make use of 1D LSTM, which is faster in learning and convergence, and yet achieve superior performance. In a series of experiments on IFN/ENIT database for Arabic handwriting recognition, we demonstrate that our proposed pipeline can outperform 2D LSTM networks. Furthermore, we provide comparisons with 1D LSTM networks trained with manually crafted features to show that the automatically learned features in a globally trained 1D LSTM network with our normalization step can even outperform such systems.