Share Email Print

Proceedings Paper

Degraded text recognition using word collocation
Author(s): Tao Hong; Jonathan J. Hull
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

A relaxation-based algorithm is proposed that improves the performance of a text recognition technique by propagating the influence of word collocation statistics. Word collocation refers to the likelihood that two words co-occur within a fixed distance of one another. For example, in a story about water transportation, it is highly likely that the word `river' will occur within ten words on either side of the word `boat.' The proposed algorithm receives groups of visually similar decisions (called neighborhoods) for words in a running text that are computed by a word recognition algorithm. The position of decisions within the neighborhoods are modified based on how often they co-occur with decisions in the neighborhoods of other nearby words. This process is iterated a number of times effectively propagating the influence of the collocation statistics across an input text. This improves on a strictly local analysis by allowing for strong collocations to reinforce weak (but related) collocations elsewhere. An experimental analysis is discussed in which the algorithm is applied to improving text recognition results that are less than 60% correct. The correct rate is effectively improved to 90% or better in all cases.

Paper Details

Date Published: 23 March 1994
PDF: 8 pages
Proc. SPIE 2181, Document Recognition, (23 March 1994); doi: 10.1117/12.171121
Show Author Affiliations
Tao Hong, SUNY/Buffalo (United States)
Jonathan J. Hull, SUNY/Buffalo (United States)

Published in SPIE Proceedings Vol. 2181:
Document Recognition
Luc M. Vincent; Theo Pavlidis, Editor(s)

© SPIE. Terms of Use
Back to Top