Share Email Print

Proceedings Paper

Unsupervised method to generate page templates
Author(s): Hervé Déjean
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

In this paper, we propose a method for automatically inferring the different page templates used to layout the document content. The first step of the method consists in performing a logical analysis of the document. Depending of the coverage of this step, a given number of document elements will be labeled. Then geometric relations are computed between these labeled elements, and page templates candidates are generated using frequent related elements. A fuzzy matching operation allows for selecting the most frequent and relevant page templates for a given document. Such page templates can be used to correct errors produced during the different previous steps of the document analysis: zoning, OCR, and logical analysis. Evaluation has been performed using the INEX book track collection.

Paper Details

Date Published: 24 January 2011
PDF: 10 pages
Proc. SPIE 7874, Document Recognition and Retrieval XVIII, 78740M (24 January 2011); doi: 10.1117/12.873160
Show Author Affiliations
Hervé Déjean, Xerox Research Ctr. Europe (France)

Published in SPIE Proceedings Vol. 7874:
Document Recognition and Retrieval XVIII
Gady Agam; Christian Viard-Gaudin, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?