Share Email Print
cover

Proceedings Paper

Spotting words in handwritten Arabic documents
Author(s): Sargur Srihari; Harish Srinivasan; Pavithra Babu; Chetan Bhole
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

The design and performance of a system for spotting handwritten Arabic words in scanned document images is presented. Three main components of the system are a word segmenter, a shape based matcher for words and a search interface. The user types in a query in English within a search window, the system finds the equivalent Arabic word, e.g., by dictionary look-up, locates word images in an indexed (segmented) set of documents. A two-step approach is employed in performing the search: (1) prototype selection: the query is used to obtain a set of handwritten samples of that word from a known set of writers (these are the prototypes), and (2) word matching: the prototypes are used to spot each occurrence of those words in the indexed document database. A ranking is performed on the entire set of test word images-- where the ranking criterion is a similarity score between each prototype word and the candidate words based on global word shape features. A database of 20,000 word images contained in 100 scanned handwritten Arabic documents written by 10 different writers was used to study retrieval performance. Using five writers for providing prototypes and the other five for testing, using manually segmented documents, 55% precision is obtained at 50% recall. Performance increases as more writers are used for training.

Paper Details

Date Published: 16 January 2006
PDF: 12 pages
Proc. SPIE 6067, Document Recognition and Retrieval XIII, 606702 (16 January 2006); doi: 10.1117/12.643107
Show Author Affiliations
Sargur Srihari, State Univ. of New York, Buffalo (United States)
Harish Srinivasan, State Univ. of New York, Buffalo (United States)
Pavithra Babu, State Univ. of New York, Buffalo (United States)
Chetan Bhole, State Univ. of New York, Buffalo (United States)


Published in SPIE Proceedings Vol. 6067:
Document Recognition and Retrieval XIII
Kazem Taghva; Xiaofan Lin, Editor(s)

© SPIE. Terms of Use
Back to Top