Share Email Print

Proceedings Paper

A mixed approach to auto-detection of page body
Author(s): Liangcai Gao; Zhi Tang; Ruiheng Qiu
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

Page body holds the central information of a page in most documents. This paper addresses the problem of automatically detecting page body area in digital books or journals. A novel method based on font expansion and header and footer elimination is detailed. This method extracts body text font (BFont) and headers and footers from a document first, and then draws two page body bounding boxes for each page, one by analyzing the distribution of BFont in pages and the other by removing headers and footers from pages. Finally, the two bounding boxes are combined to obtain the resultant page body bounding box. The test results demonstrate very high recognition rate: up to 99.49% in precision.

Paper Details

Date Published: 28 January 2008
PDF: 7 pages
Proc. SPIE 6815, Document Recognition and Retrieval XV, 68150T (28 January 2008); doi: 10.1117/12.765886
Show Author Affiliations
Liangcai Gao, Peking Univ. (China)
Zhi Tang, Peking Univ. (China)
Ruiheng Qiu, Peking Univ. (China)

Published in SPIE Proceedings Vol. 6815:
Document Recognition and Retrieval XV
Berrin A. Yanikoglu; Kathrin Berkner, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?