Share Email Print
cover

Proceedings Paper

Modified character-level deciphering algorithm for OCR in degraded documents
Author(s): Chi Fang; Jonathan J. Hull
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

Modifications to a previous character-level deciphering algorithm for OCR are presented in this paper that are able to handle touching characters and are tolerant to mistakes made at the clustering stage. The objective of a character-level deciphering algorithm is to assign alphabetic identities to character patterns such that the character repetition pattern in an input text matches the letter repetition pattern provided by a language model. Degradation in document images usually causes the occurrence of touching characters and mistakes in clustering the character patterns, which pose difficulties for character-level deciphering algorithms. The modifications proposed in this paper tightly integrate visual constraints from characters and touching patterns with constraints from a language model to decode touching characters and to detect and reverse clustering mistakes. It provides a deciphering algorithm with robust performance under image degradation.

Paper Details

Date Published: 30 March 1995
PDF: 8 pages
Proc. SPIE 2422, Document Recognition II, (30 March 1995); doi: 10.1117/12.205843
Show Author Affiliations
Chi Fang, SUNY/Buffalo (United States)
Jonathan J. Hull, Ricoh California Research Ctr. (United States)


Published in SPIE Proceedings Vol. 2422:
Document Recognition II
Luc M. Vincent; Henry S. Baird, Editor(s)

© SPIE. Terms of Use
Back to Top