Share Email Print

Proceedings Paper

Segmentation and labeling of documents using conditional random fields
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

The paper describes the use of Conditional Random Fields(CRF) utilizing contextual information in automatically labeling extracted segments of scanned documents as Machine-print, Handwriting and Noise. The result of such a labeling can serve as an indexing step for a context-based image retrieval system or a bio-metric signature verification system. A simple region growing algorithm is first used to segment the document into a number of patches. A label for each such segmented patch is inferred using a CRF model. The model is flexible enough to include signatures as a type of handwriting and isolate it from machine-print and noise. The robustness of the model is due to the inherent nature of modeling neighboring spatial dependencies in the labels as well as the observed data using CRF. Maximum pseudo-likelihood estimates for the parameters of the CRF model are learnt using conjugate gradient descent. Inference of labels is done by computing the probability of the labels under the model with Gibbs sampling. Experimental results show that this approach provides for 95.75% of the data being assigned correct labels. The CRF based model is shown to be superior to Neural Networks and Naive Bayes.

Paper Details

Date Published: 29 January 2007
PDF: 9 pages
Proc. SPIE 6500, Document Recognition and Retrieval XIV, 65000U (29 January 2007); doi: 10.1117/12.704410
Show Author Affiliations
Shravya Shetty, Univ. at Buffalo, SUNY (United States)
Harish Srinivasan, Univ. at Buffalo, SUNY (United States)
Matthew Beal, Univ. at Buffalo, SUNY (United States)
Sargur Srihari, Univ. at Buffalo, SUNY (United States)

Published in SPIE Proceedings Vol. 6500:
Document Recognition and Retrieval XIV
Xiaofan Lin; Berrin A. Yanikoglu, Editor(s)

© SPIE. Terms of Use
Back to Top