Share Email Print

Proceedings Paper

Graphic composite segmentation for PDF documents with complex layouts
Author(s): Canhui Xu; Zhi Tang; Xin Tao; Cao Shi
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

Converting the PDF books to re-flowable format has recently attracted various interests in the area of e-book reading. Robust graphic segmentation is highly desired for increasing the practicability of PDF converters. To cope with various layouts, a multi-layer concept is introduced to segment graphic composites including photographic images, drawings with text insets or surrounded with text elements. Both image based analysis and inherent digital born document advantages are exploited in this multi-layer based layout analysis method. By combining low-level page elements clustering applied on PDF documents and connected component analysis on synthetically generated PNG image document, graphic composites can be segmented for PDF documents with complex layouts. The experimental results on graphic composite segmentation of PDF document pages have shown satisfactory performance.

Paper Details

Date Published: 4 February 2013
PDF: 10 pages
Proc. SPIE 8658, Document Recognition and Retrieval XX, 86580E (4 February 2013); doi: 10.1117/12.2003705
Show Author Affiliations
Canhui Xu, Peking Univ. (China)
Zhongguancun Haidian Science Park (China)
Zhi Tang, Peking Univ. (China)
Zhongguancun Haidian Science Park (China)
Xin Tao, Peking Univ. (China)
Cao Shi, Peking Univ. (China)

Published in SPIE Proceedings Vol. 8658:
Document Recognition and Retrieval XX
Richard Zanibbi; Bertrand Coüasnon, Editor(s)

© SPIE. Terms of Use
Back to Top