Share Email Print

Proceedings Paper

Table structure recognition based on robust block segmentation
Author(s): Thomas G. Kieninger
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

This paper presents an efficient approach to identify tabular structures within either electronic or paper documents. The resulting T-Recs system takes word bounding box information as input, and outputs the corresponding logical text block units. Starting with an arbitrary word as block seed the algorithm recursively expands this block to all words that interleave with their vertical neighbors. Since even smallest gaps of table columns prevent their words from mutual interleaving, this initial segmentation is able to identify and isolate such columns. In order to deal with some inherent segmentation errors caused by isolated lines, overhanging words, or cells spawning more than one column, a series of postprocessing steps is added. These steps benefit form a very simple distinction between type 1 and type 2 blocks: type 1 blocks are those of at most one word per line, all others are of type 2. This distinction allows the selective application of heuristics to each group of blocks. The conjoint decomposition of column blocks into subsets of table cells leads to the final block segmentation of a homogeneous abstraction level. These segments serve the final layout analysis which identifies table environments and cells that are stretching over several rows and/or columns.

Paper Details

Date Published: 1 April 1998
PDF: 11 pages
Proc. SPIE 3305, Document Recognition V, (1 April 1998); doi: 10.1117/12.304642
Show Author Affiliations
Thomas G. Kieninger, German Research Ctr. for Artificial Intelligence (Germany)

Published in SPIE Proceedings Vol. 3305:
Document Recognition V
Daniel P. Lopresti; Jiangying Zhou, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?