
Proceedings Paper
Header and footer extraction by page associationFormat | Member Price | Non-Member Price |
---|---|---|
$17.00 | $21.00 |
Paper Abstract
This paper introduces a robust algorithm to extract headers and footers from a variety of electronic documents, such as image files, Adobe PDF files, and files generated from OCR. Compared with the conventional methods based on the page-level layout and format, the proposed strategy considers a page in the context of neighboring pages. Through the page-association, the headers and footers in different patterns can be automatically detected without human interference or individual templates. In addition, fuzzy string match makes the method robust against OCR errors.
Paper Details
Date Published: 13 January 2003
PDF: 8 pages
Proc. SPIE 5010, Document Recognition and Retrieval X, (13 January 2003); doi: 10.1117/12.472833
Published in SPIE Proceedings Vol. 5010:
Document Recognition and Retrieval X
Tapas Kanungo; Elisa H. Barney Smith; Jianying Hu; Paul B. Kantor, Editor(s)
PDF: 8 pages
Proc. SPIE 5010, Document Recognition and Retrieval X, (13 January 2003); doi: 10.1117/12.472833
Show Author Affiliations
Xiaofan Lin, Hewlett-Packard Labs. (United States)
Published in SPIE Proceedings Vol. 5010:
Document Recognition and Retrieval X
Tapas Kanungo; Elisa H. Barney Smith; Jianying Hu; Paul B. Kantor, Editor(s)
© SPIE. Terms of Use
