Share Email Print
cover

Proceedings Paper

Unsupervised method to generate page templates
Author(s): Hervé Déjean
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

In this paper, we propose a method for automatically inferring the different page templates used to layout the document content. The first step of the method consists in performing a logical analysis of the document. Depending of the coverage of this step, a given number of document elements will be labeled. Then geometric relations are computed between these labeled elements, and page templates candidates are generated using frequent related elements. A fuzzy matching operation allows for selecting the most frequent and relevant page templates for a given document. Such page templates can be used to correct errors produced during the different previous steps of the document analysis: zoning, OCR, and logical analysis. Evaluation has been performed using the INEX book track collection.

Paper Details

Date Published: 24 January 2011
PDF: 10 pages
Proc. SPIE 7874, Document Recognition and Retrieval XVIII, 78740M (24 January 2011); doi: 10.1117/12.873160
Show Author Affiliations
Hervé Déjean, Xerox Research Ctr. Europe (France)


Published in SPIE Proceedings Vol. 7874:
Document Recognition and Retrieval XVIII
Gady Agam; Christian Viard-Gaudin, Editor(s)

© SPIE. Terms of Use
Back to Top