Share Email Print
cover

Proceedings Paper

Non-Manhattan layout extraction algorithm
Author(s): Aziza Satkhozhina; Ildus Ahmadullin; Jan P. Allebach; Qian Lin; Jerry Liu; Daniel Tretter; Eamonn O'Brien-Strain; Andrew Hunter
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

Automated publishing requires large databases containing document page layout templates. The number of layout templates that need to be created and stored grows exponentially with the complexity of the document layouts. A better approach for automated publishing is to reuse layout templates of existing documents for the generation of new documents. In this paper, we present an algorithm for template extraction from a docu- ment page image. We use the cost-optimized segmentation algorithm (COS) to segment the image, and Voronoi decomposition to cluster the text regions. Then, we create a block image where each block represents a homo- geneous region of the document page. We construct a geometrical tree that describes the hierarchical structure of the document page. We also implement a font recognition algorithm to analyze the font of each text region. We present a detailed description of the algorithm and our preliminary results.

Paper Details

Date Published: 21 March 2013
PDF: 6 pages
Proc. SPIE 8664, Imaging and Printing in a Web 2.0 World IV, 86640A (21 March 2013); doi: 10.1117/12.2009424
Show Author Affiliations
Aziza Satkhozhina, Purdue Univ. (United States)
Ildus Ahmadullin, Hewlett-Packard Labs. (United States)
Jan P. Allebach, Purdue Univ. (United States)
Qian Lin, Hewlett-Packard Labs. (United States)
Jerry Liu, Hewlett-Packard Labs. (United States)
Daniel Tretter, Hewlett-Packard Labs. (United States)
Eamonn O'Brien-Strain, Hewlett-Packard Labs. (United States)
Andrew Hunter, Hewlett-Packard Labs. (United Kingdom)


Published in SPIE Proceedings Vol. 8664:
Imaging and Printing in a Web 2.0 World IV
Qian Lin; Jan P. Allebach; Zhigang Fan, Editor(s)

© SPIE. Terms of Use
Back to Top