News Archive
  • December 2024
  • October 2024
  • September 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
Winning of smartDoc competition – ICDAR 2015 Conference

We are delighted to announce the winning of smartDoc competition (challenge 2) as part of the competitions in ICDAR 2015 Conference.

The goal of the competition is to extract the textual content from document images which are captured by mobile phones. The images are taken under varying conditions to provide a challenging input (full description of the challenge).

The method is based on “Combining Clustering and Classfication Results” (CCC); initially the background color is used to first detect and dewarp the document. The image is then binarized to extract lines, words and subwords. Those are then clustered incrementally across all the corpus. A 1D LSTM is trained on both sharp and blurry gray-scale text-lines for recognizing subwords. Clusters of subwords are labeled by majority voting.