Share Email Print

Proceedings Paper

A mixed approach to book splitting
Author(s): Liangcai Gao; Zhi Tang
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

In this paper, we present a hybrid approach to splitting a book document into individual chapters. We use multiple sources of information to obtain a reliable assessment of the chapter title pages. These sources are produced by four methods: blank space detection, font analysis, header and footer association, and table of content (TOC) analysis. Finally, a combination component is used to score potential chapter title pages and select the best candidates. This approach takes full advantage of various kinds of information such as page header and footer, layout, and keywords. It works well even without the information of TOC which is crucial for most previous similar researches. Experiments show that this approach is robust and reliable.

Paper Details

Date Published: 28 January 2008
PDF: 8 pages
Proc. SPIE 6815, Document Recognition and Retrieval XV, 68150B (28 January 2008); doi: 10.1117/12.765813
Show Author Affiliations
Liangcai Gao, Peking Univ. (China)
Zhi Tang, Peking Univ. (China)

Published in SPIE Proceedings Vol. 6815:
Document Recognition and Retrieval XV
Berrin A. Yanikoglu; Kathrin Berkner, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?