Share Email Print
cover

Proceedings Paper

A Cut-Based Procedure For Document-Layout Modelling And Automatic Document Analysis
Author(s): Andreas R. Dengel
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

With the growing degree of office automation and the decreasing costs of storage devices, it becomes more and more attractive to store optically scanned documents like letters or reports in an electronic form. Therefore the need of a good paper-computer interface becomes increasingly important. This interface must convert paper documents into an electronic representation that not only captures their contents, but also their layout and logical structure. We propose a procedure to describe the layout of a document page by dividing it recursively into nested rectangular areas. A semantic meaning to each one will be assigned by means of logical labels. The procedure is used as a basis for modelling a hierarchical document layout onto the semantic meaning of the parts in the document. We analyse the layout of a document using a best-first search in this tesselation structure. The search is directed by a measure of similarity between the layout pattern in the model and the layout of the actual document. The validity of a hypothesis for the semantic labelling of a layout block can then be verified. It either supports the hypothesis or initiates the generation of a new one. The method has been implemented in Common Lisp on a SUN 3/60 Workstation and has run for a large population of office docu-ments. The results obtained have been very encouraging and have convincingly confirmed the soundness of the approach.

Paper Details

Date Published: 21 March 1989
PDF: 8 pages
Proc. SPIE 1095, Applications of Artificial Intelligence VII, (21 March 1989); doi: 10.1117/12.969361
Show Author Affiliations
Andreas R. Dengel, University of Stuttgart and German Research Center for Artificial Intelligence (Germany)


Published in SPIE Proceedings Vol. 1095:
Applications of Artificial Intelligence VII
Mohan M. Trivedi, Editor(s)

© SPIE. Terms of Use
Back to Top