Share Email Print
cover

Proceedings Paper

Table analysis for multiline cell identification
Author(s): John C. Handley
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

A table in a document is a rectilinear arrangement of cells where each cell contains a sequence of words. Several lines of text may compose one cell. Cells may be delimited by horizontal or vertical lines, but often this is not the case. A table analysis system is described which reconstructs table formatting information from table images whether or not the cells are explicitly delimited. Inputs to the system are word bounding boxes and any horizontal and vertical lines that delimit cells. Using a sequence of carefully-crafted rules, multi-line cells and their interrelationships are found even though no explicit delimiters are visible. This robust system is a component of a commercial document recognition system.

Paper Details

Date Published: 21 December 2000
PDF: 10 pages
Proc. SPIE 4307, Document Recognition and Retrieval VIII, (21 December 2000); doi: 10.1117/12.410853
Show Author Affiliations
John C. Handley, Xerox Corp. (United States)


Published in SPIE Proceedings Vol. 4307:
Document Recognition and Retrieval VIII
Paul B. Kantor; Daniel P. Lopresti; Jiangying Zhou, Editor(s)

© SPIE. Terms of Use
Back to Top