Share Email Print

Proceedings Paper

A simple and effective figure caption detection system for old-style documents
Author(s): Zongyi Liu; Hanning Zhou
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

Identifying figure captions has wide applications in producing high quality e-books such as kindle books or ipad books. In this paper, we present a rule-based system to detect horizontal figure captions in old-style documents. Our algorithm consists of three steps: (i) segment images into regions of different types such as text and figures, (ii) search the best caption region candidate based on heuristic rules such as region alignments and distances, and (iii) expand caption regions identified in step (ii) with its neighboring text-regions in order to correct oversegmentation errors. We test our algorithm using 81 images collected from old-style books, with each image containing at least one figure area. We show that the approach is able to correctly detect figure captions from images with different layouts, and we also measure its performances in terms of both precision rate and recall rate.

Paper Details

Date Published: 24 January 2011
PDF: 7 pages
Proc. SPIE 7874, Document Recognition and Retrieval XVIII, 78740T (24 January 2011); doi: 10.1117/12.872144
Show Author Affiliations
Zongyi Liu, (United States)
Hanning Zhou, (United States)

Published in SPIE Proceedings Vol. 7874:
Document Recognition and Retrieval XVIII
Gady Agam; Christian Viard-Gaudin, Editor(s)

© SPIE. Terms of Use
Back to Top