News Archive
  • September 2015
  • August 2015
  • July 2015
  • June 2015
  • May 2015
  • April 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014
  • November 2014
  • October 2014
Winning of smartDoc competition – ICDAR 2015 Conference

We are delighted to announce the winning of smartDoc competition (challenge 2) as part of the competitions in ICDAR 2015 Conference.

The goal of the competition is to extract the textual content from document images which are captured by mobile phones. The images are taken under varying conditions to provide a challenging input (full description of the challenge).

The method is based on “Combining Clustering and Classfication Results” (CCC); initially the background color is used to first detect and dewarp the document. The image is then binarized to extract lines, words and subwords. Those are then clustered incrementally across all the corpus. A 1D LSTM is trained on both sharp and blurry gray-scale text-lines for recognizing subwords. Clusters of subwords are labeled by majority voting.