Share Email Print

Proceedings Paper

How to find mathematics on a scanned page
Author(s): Richard J. Fateman
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

We describe the design of document analysis procedures to separate mathematics from ordinary text on a scanned page of mixed material. It is easy to observe that the accuracy of commercial OCR programs is helped by separating mixed material into two (or more) streams, with conventional non-math text handled by the usual OCR text-based-heuristics analysis. The second stream, consisting of material judged to be mathematics, can be fed to a specialized recognizer. If that fails to decode it, it can be passed on to yet a third stream including diagrams, logos, or other miscellaneous material, perhaps including halftones. We explore the extent to which this separation can be automated in the context of scanning archival material for a digital library project including mathematical and scientific journal pages.

Paper Details

Date Published: 22 December 1999
PDF: 12 pages
Proc. SPIE 3967, Document Recognition and Retrieval VII, (22 December 1999); doi: 10.1117/12.373482
Show Author Affiliations
Richard J. Fateman, Univ. of California/Berkeley (United States)

Published in SPIE Proceedings Vol. 3967:
Document Recognition and Retrieval VII
Daniel P. Lopresti; Jiangying Zhou, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?