Share Email Print

Proceedings Paper

Graphics extraction in a PDF document
Author(s): Hui Chao
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

PDF is a document format for final presentation. It preserves the original document layout but often not the document logical structure. Graphic illustrations such as figures and tables in PDF often consist of ungrouped graphic primitives such as lines, curves and small text elements. In this paper, we present a bottom up approach to recognize graphic illustration in PDF document. Vicinities of page elements in both 2D space and indexes in layer are used to understand the logical connection between elements. Graphics recognition and elements grouping for illustration is an important part in understanding the document logical structure. This technique can be used in automatic figure extraction, document re-flow and document transformation.

Paper Details

Date Published: 13 January 2003
PDF: 9 pages
Proc. SPIE 5010, Document Recognition and Retrieval X, (13 January 2003); doi: 10.1117/12.479683
Show Author Affiliations
Hui Chao, Hewlett-Packard Labs. (United States)

Published in SPIE Proceedings Vol. 5010:
Document Recognition and Retrieval X
Tapas Kanungo; Elisa H. Barney Smith; Jianying Hu; Paul B. Kantor, Editor(s)

© SPIE. Terms of Use
Back to Top