Share Email Print

Proceedings Paper

Tabular document recognition
Author(s): M. Armon Rahgozar; Zhigang Fan; Emil V. Rainero
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

In this paper, we propose an efficient algorithm for recognizing the grid structure within a tabular document. The algorithm has two parts: first a row labeling algorithm groups similar rows into clusters then, a column labeling algorithm identifies the column structure within each cluster. Each column structure is identified by a set of column separation intervals that are computed from the intervals representing the extent of the white spacing between consecutive word fragments. We formally describe a method for finding column separation intervals based on word fragment separation intervals. This method is based on constructing a closure of a set of line intervals under the operation of line intersection. The closure is maintained dynamically in a data structure which facilitates easy access to the elements within the closure. This technique is computationally less expensive than projection and search at the pixel level since word fragment acquisition is already required for document recognition applications.

Paper Details

Date Published: 23 March 1994
PDF: 10 pages
Proc. SPIE 2181, Document Recognition, (23 March 1994); doi: 10.1117/12.171096
Show Author Affiliations
M. Armon Rahgozar, Xerox Webster Research Ctr. (United States)
Zhigang Fan, Xerox Webster Research Ctr. (United States)
Emil V. Rainero, Xerox Webster Research Ctr. (United States)

Published in SPIE Proceedings Vol. 2181:
Document Recognition
Luc M. Vincent; Theo Pavlidis, Editor(s)

© SPIE. Terms of Use
Back to Top