Share Email Print
cover

Proceedings Paper

Document understanding using layout styles of title page images
Author(s): Louis H. Sharpe; Basil Manns
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

An important problem in the application of compound document architectures is the input of data from raster images. One technique is to use visual, syntactic cues found in the layout of the raster document to infer its logical structure or semantics. Another is to use context derived from characters recognized within a given block of raster data. Both character- and image- based information are considered here. A well-constrained environment is defined for use in developing rules that can be applied to basic book title page understanding. This paper identifies the attributes of title page layout objects which aid in mapping them into the fields of a simple bibliographic format. Using as input the raster images of the title page and the verso of the title page along with the ASCII output of a generic character recognition engine from these same images, a system of rules is defined for generating a marked-up text wherein key bibliographic fields may be identified.

Paper Details

Date Published: 1 August 1992
PDF: 11 pages
Proc. SPIE 1661, Machine Vision Applications in Character Recognition and Industrial Inspection, (1 August 1992); doi: 10.1117/12.130273
Show Author Affiliations
Louis H. Sharpe, Picture Elements (United States)
Basil Manns, Library of Congress (United States)


Published in SPIE Proceedings Vol. 1661:
Machine Vision Applications in Character Recognition and Industrial Inspection
Donald P. D'Amato; Wolf-Ekkehard Blanz; Byron E. Dom; Sargur N. Srihari, Editor(s)

© SPIE. Terms of Use
Back to Top