Share Email Print

Proceedings Paper

An OCR based approach for word spotting in Devanagari documents
Author(s): Anurag Bhardwaj; Suryaprakash Kompalli; Srirangaraj Setlur; Venu Govindaraju
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

This paper describes an OCR-based technique for word spotting in Devanagari printed documents. The system accepts a Devanagari word as input and returns a sequence of word images that are ranked according to their similarity with the input query. The methodology involves line and word separation, pre-processing document words, word recognition using OCR and similarity matching. We demonstrate a Block Adjacency Graph (BAG) based document cleanup in the pre-processing phase. During word recognition, multiple recognition hypotheses are generated for each document word using a font-independent Devanagari OCR. The similarity matching phase uses a cost based model to match the word input by a user and the OCR results. Experiments are conducted on document images from the publicly available ILT and Million Book Project dataset. The technique achieves an average precision of 80% for 10 queries and 67% for 20 queries for a set of 64 documents containing 5780 word images. The paper also presents a comparison of our method with template-based word spotting techniques.

Paper Details

Date Published: 28 January 2008
PDF: 9 pages
Proc. SPIE 6815, Document Recognition and Retrieval XV, 68150O (28 January 2008); doi: 10.1117/12.767289
Show Author Affiliations
Anurag Bhardwaj, Univ. at Buffalo (United States)
Suryaprakash Kompalli, Univ. at Buffalo (United States)
Srirangaraj Setlur, Univ. at Buffalo (United States)
Venu Govindaraju, Univ. at Buffalo (United States)

Published in SPIE Proceedings Vol. 6815:
Document Recognition and Retrieval XV
Berrin A. Yanikoglu; Kathrin Berkner, Editor(s)

© SPIE. Terms of Use
Back to Top