Share Email Print
cover

Proceedings Paper

Trigram-based algorithms for OCR result correction
Author(s): Konstantin Bulatov; Temudzhin Manzhikov; Oleg Slavin; Igor Faradjev; Igor Janiszewski
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

In this paper we consider a task of improving optical character recognition (OCR) results of document fields on low-quality and average-quality images using N-gram models. Cyrillic fields of Russian Federation internal passport are analyzed as an example. Two approaches are presented: the first one is based on hypothesis of dependence of a symbol from two adjacent symbols and the second is based on calculation of marginal distributions and Bayesian networks computation. A comparison of the algorithms and experimental results within a real document OCR system are presented, it's showed that the document field OCR accuracy can be improved by more than 6% for low-quality images.

Paper Details

Date Published: 17 March 2017
PDF: 5 pages
Proc. SPIE 10341, Ninth International Conference on Machine Vision (ICMV 2016), 103410O (17 March 2017); doi: 10.1117/12.2268559
Show Author Affiliations
Konstantin Bulatov, Institute for Systems Analysis (Russian Federation)
Temudzhin Manzhikov, Moscow Institute of Physics and Technology (Russian Federation)
Oleg Slavin, Institute for Systems Analysis (Russian Federation)
Moscow Institute of Physics and Technology (Russian Federation)
Igor Faradjev, Institute for Systems Analysis (Russian Federation)
Igor Janiszewski, Institute for Systems Analysis (Russian Federation)


Published in SPIE Proceedings Vol. 10341:
Ninth International Conference on Machine Vision (ICMV 2016)
Antanas Verikas; Petia Radeva; Dmitry P. Nikolaev; Wei Zhang; Jianhong Zhou, Editor(s)

© SPIE. Terms of Use
Back to Top