Share Email Print
cover

Proceedings Paper

Graphic design principles for automated document segmentation and understanding
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

When designers develop a document layout their objective is to convey a specific message and provoke a specific response from the audience. Design principles provide the foundation for identifying document components and relations among them to extract implicit knowledge from the layout. Variable Data Printing enables the production of personalized printing jobs for which traditional proofing of all the job instances could result unfeasible. This paper explains a rule-based system that uses design principles to segment and understand document context. The system uses the design principles of repetition, proximity, alignment, similarity, and contrast as the foundation for the strategy in document segmentation and understanding which holds a strong relation with the recognition of artifacts produced by the infringement of the constraints articulated in the document layout. There are two main modules in the tool: the geometric analysis module; and the design rule engine. The geometric analysis module extracts explicit knowledge from the data provided in the document. The design rule module uses the information provided by the geometric analysis to establish logical units inside the document. We used a subset of XSL-FO, sufficient for designing documents with an adequate amount complexity. The system identifies components such as headers, paragraphs, lists, images and determines the relations between them, such as header-paragraph, header-list, etc. The system provides accurate information about the geometric properties of the components, detects the elements of the documents and identifies corresponding components between a proofed instance and the rest of the instances in a Variable Data Printing Job.

Paper Details

Date Published: 16 January 2006
PDF: 11 pages
Proc. SPIE 6067, Document Recognition and Retrieval XIII, 60670F (16 January 2006); doi: 10.1117/12.648508
Show Author Affiliations
J. Fernando Vega-Riveros, Univ. of Puerto Rico, Mayagüez (United States)
Hector J. Santos Villalobos, Univ. of Puerto Rico, Mayagüez (United States)


Published in SPIE Proceedings Vol. 6067:
Document Recognition and Retrieval XIII
Kazem Taghva; Xiaofan Lin, Editor(s)

© SPIE. Terms of Use
Back to Top