Share Email Print

Proceedings Paper

Header and footer extraction by page association
Author(s): Xiaofan Lin
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

This paper introduces a robust algorithm to extract headers and footers from a variety of electronic documents, such as image files, Adobe PDF files, and files generated from OCR. Compared with the conventional methods based on the page-level layout and format, the proposed strategy considers a page in the context of neighboring pages. Through the page-association, the headers and footers in different patterns can be automatically detected without human interference or individual templates. In addition, fuzzy string match makes the method robust against OCR errors.

Paper Details

Date Published: 13 January 2003
PDF: 8 pages
Proc. SPIE 5010, Document Recognition and Retrieval X, (13 January 2003); doi: 10.1117/12.472833
Show Author Affiliations
Xiaofan Lin, Hewlett-Packard Labs. (United States)

Published in SPIE Proceedings Vol. 5010:
Document Recognition and Retrieval X
Tapas Kanungo; Elisa H. Barney Smith; Jianying Hu; Paul B. Kantor, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?