Share Email Print

Proceedings Paper

Multilingual information identification and extraction from imaged documents using optical correlator technology
Author(s): Bruce W. Stalcup; James Brower; Lou Vaughn; Mike Vertuno
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

Most organizations usually have large archives of paper documents that they maintain. These archives typically contain valuable information and data, which are imaged to provide electronic access. However, once a document is either printed or imaged, these organizations had no efficient method of retrieving information from these documents. The only methods available to retrieve information from them were to either manually read them or to convert them to ASCII text using optical character recognition (OCR). For most of the archives with large numbers of documents, these methods are problematic. Manual searches are not feasible. OCR, on the other hand, can be CPU intensive and prone to error. In addition, for many foreign languages, OCR engines do not exist. By contrast, our system provides an innovative approach to the problem of retrieving information from imaged document archives utilizing a client/server architecture. Since its beginning in 1999, we have made significant advances in the development of a system that employs optical correlation (OC) technology (either software or hardware) to access directly the textual and graphic information contained in imaged paper documents therefore eliminating the OCR process. It provides a fast, accurate means of accessing this information directly from multilingual documents. In addition, our system can also rapidly and accurately detect the presence of duplicate documents within an archive using optical correlation techniques. In this paper, we describe the present system and selected examples of its capabilities. We also present some performance results (accuracy, speed, etc.) against test document sets.

Paper Details

Date Published: 27 November 2002
PDF: 9 pages
Proc. SPIE 4789, Algorithms and Systems for Optical Information Processing VI, (27 November 2002); doi: 10.1117/12.453848
Show Author Affiliations
Bruce W. Stalcup, Northrop Grumman Corp. (United States)
James Brower, Northrop Grumman Corp. (United States)
Lou Vaughn, Northrop Grumman Corp. (United States)
Mike Vertuno, Northrop Grumman Corp. (United States)

Published in SPIE Proceedings Vol. 4789:
Algorithms and Systems for Optical Information Processing VI
Bahram Javidi; Demetri Psaltis, Editor(s)

© SPIE. Terms of Use
Back to Top