Share Email Print
cover

Proceedings Paper

Stochastic language model for analyzing document physical layout
Author(s): Tapas Kanungo; Song Mao
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

Image segmentation is an important component of any document image analysis system. While many segmentation algorithms exist in the literature, very few i) allow users to specify the physical style, and ii) incorporate user-specified style information into the algorithm's objective function that is to be minimized. We describe a segmentation algorithm that models a document's physical structure as a hierarchical structure where each node describes a region of the document using a stochastic regular grammar. The exact form of the hierarchy and the stochastic language is specified by the user, while the probabilities associated with the transitions are estimated from groundtruth data. We demonstrate the segmentation algorithm on images of bilingual dictionaries.

Paper Details

Date Published: 18 December 2001
PDF: 9 pages
Proc. SPIE 4670, Document Recognition and Retrieval IX, (18 December 2001); doi: 10.1117/12.450736
Show Author Affiliations
Tapas Kanungo, IBM Almaden Research Ctr. (United States)
Song Mao, Univ. of Maryland/College Park (United States)


Published in SPIE Proceedings Vol. 4670:
Document Recognition and Retrieval IX
Paul B. Kantor; Tapas Kanungo; Jiangying Zhou, Editor(s)

© SPIE. Terms of Use
Back to Top