Share Email Print

Proceedings Paper

Document zone classification using sizes of connected components
Author(s): Jisheng Liang; Ihsin T. Phillips; Jaekyu Ha; Robert M. Haralick
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

In this paper, we describe a feature based supervised zone classifier using only the knowledge of the widths and the heights of the connected-components within a given zone. The distribution of the widths and the heights of the connected-components is encoded into a n multiplied by m dimensional vector in the decision making. Thus, the computational complexity is in the order of the number of connected-components within the given zone. A binary decision tree is used to assign a zone class on the basis of its feature vector. The training and testing data sets for the algorithm are drawn from the scientific document pages in the UW-I database. The classifier is able to classify each given scientific and technical document zone into one of the eight labels: text of font size 8-12, text of font size 13-18, text of font size 19-36, display math, table, halftone, line drawing, and ruling, in real time. The classifier is able to discriminate text from non-text with an accuracy greater than 97%.

Paper Details

Date Published: 7 March 1996
PDF: 8 pages
Proc. SPIE 2660, Document Recognition III, (7 March 1996); doi: 10.1117/12.234719
Show Author Affiliations
Jisheng Liang, Univ. of Washington (United States)
Ihsin T. Phillips, Seattle Univ. (United States)
Jaekyu Ha, Univ. of Washington (United States)
Robert M. Haralick, Univ. of Washington (United States)

Published in SPIE Proceedings Vol. 2660:
Document Recognition III
Luc M. Vincent; Jonathan J. Hull, Editor(s)

© SPIE. Terms of Use
Back to Top