Share Email Print

Proceedings Paper

Non-Manhattan layout extraction algorithm
Author(s): Aziza Satkhozhina; Ildus Ahmadullin; Jan P. Allebach; Qian Lin; Jerry Liu; Daniel Tretter; Eamonn O'Brien-Strain; Andrew Hunter
Format Member Price Non-Member Price
PDF $17.00 $21.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

Automated publishing requires large databases containing document page layout templates. The number of layout templates that need to be created and stored grows exponentially with the complexity of the document layouts. A better approach for automated publishing is to reuse layout templates of existing documents for the generation of new documents. In this paper, we present an algorithm for template extraction from a docu- ment page image. We use the cost-optimized segmentation algorithm (COS) to segment the image, and Voronoi decomposition to cluster the text regions. Then, we create a block image where each block represents a homo- geneous region of the document page. We construct a geometrical tree that describes the hierarchical structure of the document page. We also implement a font recognition algorithm to analyze the font of each text region. We present a detailed description of the algorithm and our preliminary results.

Paper Details

Date Published: 21 March 2013
PDF: 6 pages
Proc. SPIE 8664, Imaging and Printing in a Web 2.0 World IV, 86640A (21 March 2013); doi: 10.1117/12.2009424
Show Author Affiliations
Aziza Satkhozhina, Purdue Univ. (United States)
Ildus Ahmadullin, Hewlett-Packard Labs. (United States)
Jan P. Allebach, Purdue Univ. (United States)
Qian Lin, Hewlett-Packard Labs. (United States)
Jerry Liu, Hewlett-Packard Labs. (United States)
Daniel Tretter, Hewlett-Packard Labs. (United States)
Eamonn O'Brien-Strain, Hewlett-Packard Labs. (United States)
Andrew Hunter, Hewlett-Packard Labs. (United Kingdom)

Published in SPIE Proceedings Vol. 8664:
Imaging and Printing in a Web 2.0 World IV
Qian Lin; Jan P. Allebach; Zhigang Fan, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?