Share Email Print
cover

Proceedings Paper

Algorithms to separate text from a mixed text/graphic document and generate a succinct description for this complex graphic
Author(s): Sing T. Bow; Jianjun Sa
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

The objective of this paper is to describe an approach to separate text from a mixed text/graphic document, and describe this graphic as overlapping meaningful shapes. Accuracy in the reconstruction of the mixed text/graphic document from the description file is also reported. This paper is a continuation of our previous work, which was mainly on engineering drawings with polygonal shapes. This paper focuses on documents consisting of any curved shape components with text. In this paper algorithms are designed to automate the process of generation of loops with minimum redundancy from the bit map of the image, and to break the interweaved complex loops into simpler interpretable shapes of curved segments. Finally, a succinct description file can be established for the whole image, thus achieving drastic saving in memory when archiving the document images. Effectiveness of the algorithms has been evaluated through experiments on a large number of mixed text/graphic documents. Results show that the algorithms developed are computationally efficient. Once the text is separated from the graphic, the graphic image is then decomposed into the meaningful component parts, the data reduction achieved through this succinct description is extremely high. Even for those silhouettes of curved shape, an approach, called concatenated-arc representation, is developed for their description. With this concatenated-arc approach, much fewer number of arc segments are needed than those needed by line segment approximation. Shapes reconstructed from these description files match closely with the original ones, even for the very complex graphics.

Paper Details

Date Published: 12 January 1993
PDF: 12 pages
Proc. SPIE 1771, Applications of Digital Image Processing XV, (12 January 1993); doi: 10.1117/12.139066
Show Author Affiliations
Sing T. Bow, Northern Illinois Univ. (United States)
Jianjun Sa, Northern Illinois Univ. (United States)


Published in SPIE Proceedings Vol. 1771:
Applications of Digital Image Processing XV
Andrew G. Tescher, Editor(s)

© SPIE. Terms of Use
Back to Top