Share Email Print
cover

Proceedings Paper

Multiscale document segmentation using wavelet-domain hidden Markov models
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

We introduce a new document image segmentation algorithm, HMTseg, based on wavelets and the hidden Markov tree (HMT) model. The HMT is a tree-structured probabilistic graph that captures the statistical properties of the coefficients of the wavelet transform. Since the HMT is particularly well suited to images containing singularities (edges and ridges), it provides a good classifier for distinguishing between different document textures. Utilizing the inherent tree structure of the wavelet HMT and its fast training and likelihood computation algorithms, we perform multiscale texture classification at a range of different scales. We then fuse these multiscale classifications using a Bayesian probabilistic graph to obtain reliable final segmentations. Since HMTseg works on the wavelet transform of the image, it can directly segment wavelet-compressed images, without the need for decompression into the space domain. We demonstrate HMTseg's performance with both synthetic and real imagery.

Paper Details

Date Published: 22 December 1999
PDF: 14 pages
Proc. SPIE 3967, Document Recognition and Retrieval VII, (22 December 1999); doi: 10.1117/12.373498
Show Author Affiliations
Hyeokho Choi, Rice Univ. (United States)
Richard G. Baraniuk, Rice Univ. (United States)


Published in SPIE Proceedings Vol. 3967:
Document Recognition and Retrieval VII
Daniel P. Lopresti; Jiangying Zhou, Editor(s)

© SPIE. Terms of Use
Back to Top
PREMIUM CONTENT
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?
close_icon_gray