Share Email Print

Proceedings Paper

Reflowing-driven paragraph recognition for electronic books in PDF
Author(s): Jing Fang; Zhi Tang; Liangcai Gao
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

When reading electronic books on handheld devices, content sometimes should be reflowed and recomposed to adapt for small-screen mobile devices. According to people's reading practice, it is reasonable to reflow the text content based on paragraphs. Hence, this paper addresses the requirement and proposes a set of novel methods on paragraph recognition for electronic books in PDF. The proposed methods consist of three steps, namely, physical structure analysis, paragraph segmentation, and reading order detection. We make use of locally ordered property of PDF documents and layout style of books to improve traditional page recognition results. In addition, we employ the optimal matching of Bipartite Graph technology to detect paragraphs' reading order. Experiments show that our methods achieve high accuracy. It is noteworthy that, the research has been applied in a commercial software package for Chinese E-book production.

Paper Details

Date Published: 24 January 2011
PDF: 9 pages
Proc. SPIE 7874, Document Recognition and Retrieval XVIII, 78740U (24 January 2011); doi: 10.1117/12.872289
Show Author Affiliations
Jing Fang, Peking Univ. (China)
Zhi Tang, Peking Univ. (China)
Liangcai Gao, Peking Univ. (China)

Published in SPIE Proceedings Vol. 7874:
Document Recognition and Retrieval XVIII
Gady Agam; Christian Viard-Gaudin, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?