Share Email Print

Proceedings Paper

Performance evaluation of document layout analysis algorithms on the UW data set
Author(s): Jisheng Liang; Ihsin T. Phillips; Robert M. Haralick
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

A performance evaluation protocol for the layout analysis is discussed in this paper. In the University of Washington English Document Image Database-III, there are 1600 English document images that come with manually edited ground truth of entity bounding boxes. These bounding boxes enclose text and non-text zones, text-lines, and words. We describe a performance metric for the comparison of the detected entities and the ground truth in terms of their bounding boxes. The Document Attribute Format Specification is used as the standard data representation. The protocol is intended to serve as a model for using the UW-III database to evaluate the document analysis algorithms. A set of layout analysis algorithms which detect different entities have been tested based on the data set and the performance metric. The evaluation results are presented in this paper.

Paper Details

Date Published: 3 April 1997
PDF: 12 pages
Proc. SPIE 3027, Document Recognition IV, (3 April 1997); doi: 10.1117/12.270067
Show Author Affiliations
Jisheng Liang, Univ. of Washington (United States)
Ihsin T. Phillips, Seattle Univ. (United States)
Robert M. Haralick, Univ. of Washington (United States)

Published in SPIE Proceedings Vol. 3027:
Document Recognition IV
Luc M. Vincent; Jonathan J. Hull, Editor(s)

© SPIE. Terms of Use
Back to Top