Share Email Print

Proceedings Paper

Evaluation of an automatic markup system
Author(s): Kazem Taghva; Allen Condit; Julie Borsack
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

One predominant application of OCR is the recognition of full text documents for information retrieval. Modern retrieval systems exploit both the textual content of the document as well as its structure. The relationship between textual content and character accuracy have been the focus of recent studies. It has been shown that due to the redundancies in text, average precision and recall is not heavily affected by OCR character errors. What is not fully known is to what extent OCR devices can provide reliable information that can be used to capture the structure of the document. In this paper, we present a preliminary report on the design and evaluation of a system to automatically markup technical documents, based on information provided by an OCR device. The device we use differs from traditional OCR devices in that it not only performs optical character recognition, but also provides detailed information about page layout, word geometry, and font usage. Our automatic markup program, which we call Autotag, uses this information, combined with dictionary lookup and content analysis, to identify structural components of the text. These include the document title, author information, abstract, sections, section titles, paragraphs, sentences, and de-hyphenated words. A visual examination of the hardcopy is compared to the output of our markup system to determine its correctness.

Paper Details

Date Published: 30 March 1995
PDF: 11 pages
Proc. SPIE 2422, Document Recognition II, (30 March 1995); doi: 10.1117/12.205833
Show Author Affiliations
Kazem Taghva, Univ. of Nevada/Las Vegas (United States)
Allen Condit, Univ. of Nevada/Las Vegas (United States)
Julie Borsack, Univ. of Nevada/Las Vegas (United States)

Published in SPIE Proceedings Vol. 2422:
Document Recognition II
Luc M. Vincent; Henry S. Baird, Editor(s)

© SPIE. Terms of Use
Back to Top