Share Email Print
cover

Proceedings Paper

Automatic benchmarking scheme for page segmentation
Author(s): Sabine Randriamasy; Luc M. Vincent; Ben S. Wittner
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

An automatic bitmap-level, set-based benchmarking scheme for page segmentation, comparing results with predefined `ground truth files' containing all the possible correct solutions, is presented. A successful page segmentation is a necessary precondition for a document recognition process to be successful. The problems addressed here are: design methods to describe all possible correct segmentations for a given page and design methods to compare two segmentations. The proposed segmentation ground truth representation scheme defines ground truth text regions as non-mergeable maximal sets of text lines, merged in a language- dependent direction. It includes the other possible correct segmentations in that authorized cuts in the region are explicitly specified. At this low-level stage, quality criteria for a page segmentation are mainly defined as providing correct input for region ordering and classification. The qualitative and quantitative evaluation method tests the overlap between the two sets of regions. In fact, the regions are defined as being the black pixels contained in the derived polygons.

Paper Details

Date Published: 23 March 1994
PDF: 14 pages
Proc. SPIE 2181, Document Recognition, (23 March 1994); doi: 10.1117/12.171109
Show Author Affiliations
Sabine Randriamasy, Harvard Robotics Lab. (United States)
Luc M. Vincent, Xerox Imaging Systems (United States)
Ben S. Wittner, Xerox Imaging Systems (United States)


Published in SPIE Proceedings Vol. 2181:
Document Recognition
Luc M. Vincent; Theo Pavlidis, Editor(s)

© SPIE. Terms of Use
Back to Top