Share Email Print

Proceedings Paper

Using back error propagation networks for automatic document image classification
Author(s): Susan E. Hauser; Timothy J. Cookson; George R. Thoma
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

The Lister Hill National Center for Biomedical Communications is a Research and Development Division of the National Library of Medicine. One of the Center's current research projects involves the conversion of entire journals to bitmapped binary page images. In an effort to reduce operator errors that sometimes occur during document capture, three back error propagation networks were designed to automatically identify journal title based on features in the binary image of the journal's front cover page. For all three network designs, twenty five journal titles were randomly selected from the stored database of image files. Seven cover page images from each title were selected as the training set. For each title, three other cover page images were selected as the test set. Each bitmapped image was initially processed by counting the total number of black pixels in 32-pixel wide rows and columns of the page image. For the first network, these counts were scaled to create 122-element count vectors as the input vectors to a back error propagation network. The network had one output node for each journal classification. Although the network was successful in correctly classifying the 25 journals, the large input vector resulted in a large network and, consequently, a long training period. In an alternative approach, the first thirty-five coefficients of the Fast Fourier Transform of the count vector were used as the input vector to a second network. A third approach was to train a separate network for each journal using the original count vectors as input and with only one output node. The output of the network could be 'yes' (it is this journal) or 'no' (it is not this journal). This final design promises to be most efficient for a system in which journal titles are added or removed as it does not require retraining a large network for each change.

Paper Details

Date Published: 2 September 1993
PDF: 9 pages
Proc. SPIE 1965, Applications of Artificial Neural Networks IV, (2 September 1993); doi: 10.1117/12.152534
Show Author Affiliations
Susan E. Hauser, National Library of Medicine (United States)
Timothy J. Cookson, National Library of Medicine (United States)
George R. Thoma, National Library of Medicine (United States)

Published in SPIE Proceedings Vol. 1965:
Applications of Artificial Neural Networks IV
Steven K. Rogers, Editor(s)

© SPIE. Terms of Use
Back to Top