Proceedings Volume 2181

Document Recognition

Luc M. Vincent, Theo Pavlidis
cover
Proceedings Volume 2181

Document Recognition

Luc M. Vincent, Theo Pavlidis
View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 23 March 1994
Contents: 6 Sessions, 39 Papers, 0 Presentations
Conference: IS&T/SPIE 1994 International Symposium on Electronic Imaging: Science and Technology 1994
Volume Number: 2181

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Machine Printed Character Recognition
  • Handprinted Character Recognition
  • Beyond Pure Character Recognition
  • Error and Performance Analysis
  • Restoration and Binarization
  • Recognition of Special or Poorly Printed Characters
  • Beyond Pure Character Recognition
  • Recognition of Special or Poorly Printed Characters
Machine Printed Character Recognition
icon_mobile_dropdown
Communication theory framework for document recognition
Gary E. Kopec, Philip A. Chou
Document image decoding (DID) is a recently proposed generic framework for document recognition that is based on an explicit communication theory view of the processes of document creation, transmission, and interpretation. DID views a document recognition problem as containing four elements -- a message (information) source, an encoder (formatter and renderer), a noisy channel (e.g., printer, scanner) and an image decoder (recognizer). Application of DID to a particular recognition problem involves developing stochastic models for the source, encoder, and channel processes. The DID approach to modeling is based on the use of stochastic attributed context-free grammars. DID supports an approach to image decoding that has as the kernel of the method an informed best-first search algorithm, called the iterated complete path (ICP) algorithm, that is similar to branch-and bound and related heuristic search and optimization techniques. The inputs to the decoder generator are a Markov source model and values for channel parameters. The generator creates the necessary computation schedules and outputs an optimized in-line C program that implements the decoder. The customized decoder program is then compiled, linked with a support library and used to decode images.
Using projections for preclassification of character shape
Angelo Marcelli, Theo Pavlidis
Simple techniques for character recognition do not perform very well, but they are fast compared to more effective but also more complex methods. In this paper we describe how such simple techniques can be used as preprocessors for eliminating possibilities that the main classifier has to deal with. This is achieved in three different ways: by attempting a segmentation of word into characters, by reducing the number of the prototypes to match with the sample, and by providing a set of constraints for the matching. Experiments on an address data base provided by the U.S. Postal Service have shown that the method reduces the classification time by almost 60%, while it does not introduce any errors.
Segmentation-free morphological character recognition
Eugene J. Kraus, Edward R. Dougherty
A basic class of structuring-element pairs for segmentation-free character recognition via the morphological hit-or-miss transform is developed for Courier font. Both hit and miss structuring elements are sparse and they are selected so that the hit-or-miss transform can be applied across the test image without prior segmentation. Besides being able to achieve high rates of accuracy on text with touching characters, the hit-or-miss algorithm, as developed herein, is shown to be very robust with respect to the threshold level for the input gray-scale data.
Word recognition using ideal word patterns
Sheila X. Zhao, Sargur N. Srihari
The word shape analysis approach to text recognition is motivated by discoveries in psychological studies of the human reading process. It attempts to describe and compare the shape of the word as a whole object without trying to segment and recognize the individual characters, so it bypasses the errors committed in character segmentation and classification. However, the large number of classes and large variation and distortion expected in all patterns belonging to the same class make it difficult for conventional, accurate, pattern recognition approaches. A word shape analysis approach using ideal word patterns to overcome the difficulty and improve recognition performance is described in this paper. A special word pattern which characterizes a word class is extracted from different sample patterns of the word class and stored in memory. Recognition of a new word pattern is achieved by comparing it with the special pattern of each word class called ideal word pattern. The process of generating the ideal word pattern of each word class is proposed. The algorithm was tested on a set of machine printed gray scale word images which included a wide range of print types and qualities.
Handprinted Character Recognition
icon_mobile_dropdown
Unconstrained handprint recognition using a limited lexicon
A word recognition system has been developed at NIST to read free-formatted text paragraphs containing handprinted characters. The system has been developed and tested using binary images containing 2,100 different writers' printings of the Preamble to the U.S. Constitution. Each writer was asked to print these sentences in an empty 70 mm by 175 mm box. The Constitution box contains no guidelines for the placement and spacing of the handprinted text, nor are there guidelines to instruct the writer where to stop printing one line and to begin the next. While the layout of the handprint in these paragraphs is unconstrained, a limited-size lexicon may be applied to reduce the complexity of the recognition application. The system's four components have been combined into an end-to-end hybrid system that executes across a UNIX file server and a massively parallel SIMD computer. The recognition system achieves a word error rate of 49% across all 2,100 printings of the Preamble (109,096 words). This performance is achieved with a neural network character classifier that has a substitution error rate of 14% on its 22,823 training patterns.
Recognition of handprinted and cursive words by finding feature correspondences
Daniel J. Hepp
This paper describes a method for off-line recognition of handprinted and cursive words. The module takes as input a binary word image and a lexicon of strings, and ranks the lexicon according to the likelihood of match to the given word image. To perform recognition, a set of character models is used. The models employ a graph representation. Each character model consists of a set of features in spatial relationship to one another. The character models are automatically built in a clustering process. Character merging is performed by finding the appropriate correspondences between pairs of character sample features. This is accomplished by finding a solution to the assignment problem, which is an O(n3) linear programming algorithm. The end result of the training process is a set of random graph character prototypes for each character class. Because it is not possible to clearly segment the word image into characters before recognition, segmentation and recognition are bound together in a dynamic programming process. Results are presented for a set of word images extracted from mailpieces in the live mailstream.
Hierarchical approach to build a compact character recognition system
Xueping Liu
This paper proposes a hierarchical approach to build a compact character recognizing system by reducing the redundancy in dictionaries used to match a hand-written sample with its corresponding code. An elemental stroke dictionary and a character dictionary are used. An algebraic describing method of curves is adopted to divide all the strokes into several classes according to their quasi-topological features (convexity, loop, and connectivity) and geometric ones (size, orientation angle, and position in character, etc.). An elemental stroke is extracted statistically from each class so divided. Based on these elemental strokes, a character category is represented by the numbers of stroke and the types of elemental strokes. To recognize a hand-written character, we find out the type of elemental stroke for each stroke in the character at first, and then we identify the category of the input data by matching the types of elemental stroke with those in the character dictionary.
Matching database records to handwritten text
Margaret J. Ganzberger, Richard M. Rovner, Andrew M. Gillies, et al.
This paper describes a method for matching specific database records to handwritten text. While a database record contains multiple fields with complete, idealized strings, handwritten text may contain missing fields, misspellings, and abbreviations. Multiple word segmentation hypotheses are used in this method to overcome the spacing difficulties of handwritten text. To avoid the combinatorics of matching all instantiations of the record, including abbreviations and omissions, to all hypothesized word segmentations, a dynamic programming approach is employed. Inputs to the matching module include a binarized line of handwritten text and a set of potential database records. The module determines the best word segmentation, or parse, of the line given a particular record and produces an overall verification score. This module was tested using binarized, handwritten address images captured from a live mail stream. Results of matching the street line images to postal database records are presented.
Beyond Pure Character Recognition
icon_mobile_dropdown
Retaining document format in OCR
Timothy Butler
As OCR technology has improved, document formatting has become increasingly important to practical users of OCR. Detection and retention of document layout, text spacing, tabular data, font size and attributes, and graphical data are becoming new requirements of OCR systems. At the same time we recognize that document format retention is most important to a subset of OCR users.
Tabular document recognition
M. Armon Rahgozar, Zhigang Fan, Emil V. Rainero
In this paper, we propose an efficient algorithm for recognizing the grid structure within a tabular document. The algorithm has two parts: first a row labeling algorithm groups similar rows into clusters then, a column labeling algorithm identifies the column structure within each cluster. Each column structure is identified by a set of column separation intervals that are computed from the intervals representing the extent of the white spacing between consecutive word fragments. We formally describe a method for finding column separation intervals based on word fragment separation intervals. This method is based on constructing a closure of a set of line intervals under the operation of line intersection. The closure is maintained dynamically in a data structure which facilitates easy access to the elements within the closure. This technique is computationally less expensive than projection and search at the pixel level since word fragment acquisition is already required for document recognition applications.
Text characterization by connected component transformations
Larry Spitz
Worldwide there are many different scripts and languages in common use. Finding text lines and character and word boundaries, where present, are necessary primitive operations for most document processing applications. We have developed a method of handling text lines from several different languages that is robust in the presence of common printing and scanning artifacts. A technique is described by which information about the characteristics of a text line can be determined from a list of the connected pixel components that comprise the image. This technique applies across many languages and scripts that are laid out horizontally. For text comprising Roman type, the location and dimensions of each text line are augmented with positions of the baseline and x-height. Where appropriate, coordinates of space-delimited words and individual character cells are determined. This technique incorporates a computationally inexpensive method for straightening curved lines and segmenting kerned characters and a novel method based on font weight and stress for locating the boundaries of individual characters, even if their images touch.
Self-correcting 100-font classifier
We have developed a practical scheme to take advantage of local typeface homogeneity to improve the accuracy of a character classifier. Given a polyfont classifier which is capable of recognizing any of 100 typefaces moderately well, our method allows it to specialize itself automatically to the single -- but otherwise unknown -- typeface it is reading. Essentially, the classifier retrains itself after examining some of the images, guided at first by the preset classification boundaries of the given classifier, and later by the behavior of the retrained classifier. Experimental trials on 6.4 M pseudo-randomly distorted images show that the method improves on 95 of the 100 typefaces. It reduces the error rate by a factor of 2.5, averaged over 100 typefaces, when applied to an alphabet of 80 ASCII characters printed at ten point and digitized at 300 pixels/inch. This self-correcting method complements, and does not hinder, other methods for improving OCR accuracy, such as linguistic contextual analysis.
Font identification using visual global context
Siamak Khoubyari, Jonathan J. Hull
An important part of many algorithms that convert digital images of machine-printed text into their ASCII equivalent is information about fonts. This paper presents an algorithm for identifying the font in which a document is printed. The algorithm matches word-level information gathered from the document image to fonts in an image database. This method is more robust in the presence of noise than font recognition algorithms that use character-level information. Clusters of frequent function words (such as the, of, and, a, and to) are constructed from an input document image. The clusters are then matched to a database of function words derived from document images, and the document that matches best provides the identification of the input font. This technique utilizes the context from many words in an input document to overcome noise. Experimental results are presented that show near-perfect recognition of fonts, even in noisy documents.
Window-based bar code acquisition system
Chung-Chi Jim Li, Jianhua Xu, Theo Pavlidis
This paper presents the design of a two-stage bar code acquisition system that can be used to achieve error-free document recognition if the original document is enhanced by 1D or 2D bar codes. The unique point of our approach is the window-based method that can locate multiple bar codes in images with sub pixel per module resolution. The method consists of three steps: (1) candidacy test, (2) window clustering, and (3) orientation estimation. The candidacy test examines the local statistical properties (i.e., contrast, balance, and transition count) of each window and determines if it is a part of a bar code. The window clustering step eliminates small blocks of candidate windows generated by the background and then groups the remaining windows into bar code clusters. The orientation estimation step uses edge detection and least-square fit to find the aim line of each bar code. A prototype system has been implemented in the laboratory of Symbol Technologies, Inc., to test the performance of the proposed bar code acquisition algorithm. The experiment result shows that, by using 20 X 20 windows on 640 X 480 images, a Sun SPARCstation 2 can process one image in 0.3 second.
Fast and accurate skew detection algorithm for a text document or a document with straight lines
Goroh Bessho, Koichi Ejiri, John F. Cullen
Bit-mapped images are becoming more popular in offices. Skew is a major problem for many otherwise promising applications. To remove the skew, we propose a new algorithm that makes use of both printed characters and straight line(s). Lines on a document are decomposed into small segments of black runs. By checking their connectivities, we can easily tell whether those runs are from the same line or not. To remove any bad effect from variation in line width, we sample a number of different x-y coordinates along the black runs, adjacent to white pixels. Those coordinates determine a correlation function which is used to find the correlation value. If the value is close to 1.0, we compute the higher-probability regression coefficient using the same parameters. The algorithm is effective both for horizontal and vertical lines. The coefficients can also be used to align character lines. The rectangles formed by connected black pixel are extracted using two or three different compression ratios. We can tell whether those characters are from the same character line or not, by checking the coordinates of rectangles in multiple compression images.
Automatic extraction of objects from technical drawings
Alessandra Esposito, L. Boatto, Vincenzo Consorti, et al.
In this paper, a methodology to extract the constituent parts from a noisy line drawing is presented. The image is decomposed into primitives and a heuristic search is performed to identify meaningful aggregates of primitives. Separation of overlapping entities and robustness against noise were the principal objectives of the project while object-oriented programming allowed us to design a system very flexible to change and additions. Results of experimentation on different types of objects are reported.
Information extraction from tabular drawings
Sanjay Balasubramanian, Surekha Chandran, Juan Arias, et al.
This paper presents efficient methodologies to extract information from tabular drawings representing telephone cable interconnections. These tables include records of cable counts, cables in service, assignment charts, and cable running and wiring lists. An interesting problem in these drawings is that the changes to the data are occasionally recorded by crossing out entries and appending the changes rather than redrawing the entire documents. The objective of the work described here is the extraction of information contained in these table structured documents to facilitate the creation of a computer database. Our software system makes use of contextual information in these drawings (e.g., a particular line pattern is used to represent repeated entries, ignore crossed-out entries, etc.). The system uses features like inter-line spacing, length of lines, line orientation, and start and end locations of the lines to detect the diagonal lines and vertical lines with demarcations. Experimental results are also included.
Document image interpretation: classification of technologies
Sergey V. Ablameyko, Vladimir V. Bereishik
The paper considers and classifies the possible ways for construction of technology to convert document images into a geographic information system or CAD representation. Three basic document types are considered. The image types on every step of the interpretation technology are extracted and described. The different variants of the technology are considered depending on the type of an initial document and algorithms used. The classification of eleven known in literature technologies is given. Our experience in the realization of technology variants is shown.
Error and Performance Analysis
icon_mobile_dropdown
Need for information metrics: with examples from document analysis
We present an argument that progress in Information Science is inhibited by our incomplete perception of the nature of the field. An agenda for research is proposed which, we believe, will lead to more rapid progress. Specific examples are given from the field of Document Analysis.
Use of synthesized images to evaluate the performance of optical character recognition devices and algorithms
Frank R. Jenkins, Junichi Kanai
Synthesizing document images is a cost effective way to create a large test database and allows researchers to control typesetting and noise variables. Yet the effectiveness of using synthesized images in optical character recognition (OCR) research has not been extensively investigated. In this project, three kinds of test databases were used to study the performance of OCR devices: digitized `real world' documents, page images synthesized from ASCII files, and the synthesized images printed and digitized. The cleanest synthesized images were not necessarily recognized most accurately. Our results suggest that, in addition to typographical features and noise, linguistic features affect the performance of an OCR device.
Classification and distribution of optical character recognition errors
Jeffrey Esakov, Daniel P. Lopresti, Jonathan S. Sandberg
This paper describes an approach for classifying OCR errors based on a new variation of a well-known dynamic programming algorithm. We present results from a large-scale experiment we performed involving the printing, scanning, and OCRing of over one million characters in each of three fonts. Times, Helvetica, and Courier. Our data allows us to draw a number of interesting conclusions about the nature of OCR errors for a particular font, as well as the relationship between error sets for different fonts.
Automatic benchmarking scheme for page segmentation
Sabine Randriamasy, Luc M. Vincent, Ben S. Wittner
An automatic bitmap-level, set-based benchmarking scheme for page segmentation, comparing results with predefined `ground truth files' containing all the possible correct solutions, is presented. A successful page segmentation is a necessary precondition for a document recognition process to be successful. The problems addressed here are: design methods to describe all possible correct segmentations for a given page and design methods to compare two segmentations. The proposed segmentation ground truth representation scheme defines ground truth text regions as non-mergeable maximal sets of text lines, merged in a language- dependent direction. It includes the other possible correct segmentations in that authorized cuts in the region are explicitly specified. At this low-level stage, quality criteria for a page segmentation are mainly defined as providing correct input for region ordering and classification. The qualitative and quantitative evaluation method tests the overlap between the two sets of regions. In fact, the regions are defined as being the black pixels contained in the derived polygons.
Restoration and Binarization
icon_mobile_dropdown
Optimal nonlinear fax restoration
An optimal filter is estimated to restore binary fax images. The filter is an approximation of the binary conditional expectation which minimizes the expected absolute error between the degraded image and the ideal image. It is implemented as a morphological hit-or-miss filter. Estimation methodology employs a model-based simulation of the degradation due to the fax process.
Text enhancement method based on soft morphological filters
Soft morphological filters are robustly behaving extensions of standard flat morphological filters which include, as an extreme case, the weighted median filter. Soft morphological filters can take into account the geometrical shape of the processed objects and, at the same time, they are robust to additive noise and small variations in the shapes of the objects to be filtered. For suitable parameters they also have a good ability for preserving details. Thus, they provide a robust method for text enhancement. In this paper, an enhancement method based on soft morphological filters is demonstrated.
Document image binarization based on texture analysis
Ying Liu, Sargur N. Srihari
A new thresholding algorithm is presented to address strong noise, complex patterns, poor contrast, and variable modalities in gray-scale histograms. It is based on document image texture analysis. The algorithm consists of three steps. First candidate thresholds are produced from the gray scale histogram analysis. Then, texture features associated with each candidate threshold are computed from the corresponding run-length histogram. Finally, the optimal threshold is selected according to the goodness evaluation so that the most desirable document texture features are produced. Only one pass through an image is required for optimal threshold selection. The test set consisted of 9000 machine printed address blocks from an unconstrained U.S. mail stream. Over 99.6% of the images were visually well-binarized. In an objective test, a system run with 594 mail address blocks, which contains many difficult images, shows that an 8.1% higher character recognition rate was achieved, compared to that by a previous algorithm due to Otsu.
Digital image processing in the Xerox DocuTech document processing system
Ying-Wei Lin
This paper describes the realtime image processing features in the Xerox DocuTech document processing system. The image processing features offered include image enhancement, halftone screen removal (de-screening) with an FIR low pass filter, and an image segmentation algorithm developed by Xerox, which can be used for documents with text and high frequency halftone images, such as pages from typical magazines. The image segmentation algorithm uses a modified auto correlation function approach to detect halftone areas on the document. With this set of image processing features, it is possible to handle a wide variety of input documents on the scanner, and generate high quality output prints.
Recognition of Special or Poorly Printed Characters
icon_mobile_dropdown
Expert system for automatically correcting OCR output
Kazem Taghva, Julie Borsack, Allen Condit
This paper describes a new expert system for automatically correcting errors made by optical character recognition (OCR) devices. The system, which we call the post-processing system, is designed to improve the quality of text produced by an OCR device in preparation for subsequent retrieval from an information system. The system is composed of numerous parts: an information retrieval system, an English dictionary, a domain-specific dictionary, and a collection of algorithms and heuristics designed to correct as many OCR errors as possible. For the remaining errors that cannot be corrected, the system passes them on to a user-level editing program. This post-processing system can be viewed as part of a larger system that would streamline the steps of taking a document from its hard copy form to its usable electronic form, or it can be considered a stand alone system for OCR error correction. An earlier version of this system has been used to process approximately 10,000 pages of OCR generated text. Among the OCR errors discovered by this version, about 87% were corrected. We implement numerous new parts of the system, test this new version, and present the results.
Conversion of the Haydn symphonies into electronic form using automatic score recognition: a pilot study
Nicholas Paul Carter
As part of the development of an automatic recognition system for printed music scores, a series of `real-world' tasks are being undertaken. The first of these involves the production of a new edition of an existing 104-page, engraved, chamber-music score for Oxford University Press. The next substantial project, which is described here, has begun with a pilot study with a view to conversion of the 104 Haydn symphonies from a printed edition into machine- readable form. The score recognition system is based on a structural decomposition approach which provides advantages in terms of speed and tolerance of significant variations in font, scale, rotation and noise. Inevitably, some editing of the output data files is required, partially due to the limited vocabulary of symbols supported by the system and their permitted superimpositions. However, the possibility of automatically processing the bulk of the contents of over 600 pages of orchestral score in less than a day of compute time makes the conversion task manageable. The influence that this undertaking is having on the future direction of system development also is discussed.
Graph-rewriting approach to discrete relaxation: application to music recognition
Hoda M. Fahmy, Dorothea Blostein
In image analysis, low-level recognition of the primitives plays a very important role. Once the primitives of the image are recognized, depending on the application, many types of analyses can take place. It is likely that associated with each object or primitive is a set of possible interpretations, herein referred to as the label set. The low-level recognizer may associate a probability with each label in the label set. We can use the constraints of the application domain to reduce the ambiguity in the object's identity. This process is variously termed constraint satisfaction, labeling, or relaxation. In this paper, we focus on the discrete form of relaxation. Our contribution lies in the development of a graph-rewriting approach which does not assume the degree of localness is high. We apply our approach to the recognition of music notation, where non-local interactions between primitives must be used in order to reduce ambiguity in the identity of the primitives. We use graph-rewriting rules to express not only binary constraints, but also higher-order notational constraints.
Symbol recognition without prior segmentation
Badr Al-Badr, Robert M. Haralick
We describe a new method for recognizing cursive and degraded text using OCR technology. Using this method, symbols on a page are identified by detecting primitives (parts of symbols), and then finding the best global grouping of primitives into symbols. On an image of text, primitives are detected using mathematical morphology operations, in a way that does not require or involve a prior segmentation step. This paper lays out the overall strategy of a system that implements the recognition method. A following paper reports on experimental protocols and results. This system has three major features: (1) by globally optimizing the process of combining primitives into symbols, it is robust and less sensitive to noise; (2) it does not require segmenting a text block into lines, a line into words, nor a word into characters; and (3) it is language independent in that training determines the symbol set it recognizes.
Box connectivity approach to multifont character recognition
Radovan V. Krtolica
The idea of box connectivity approach (BCA) is to partition the bounding frame of the character bitmap into a fixed number of rectangles, to define some properties of those rectangles, and to establish connectivity relations between the rectangles. Hamming distance and vector optimization are used for classification. Good results in recognition of high quality data (400 dpi) in three fonts (Courier, Helvetica, and New Times Roman) and eight sizes (from 8 to 24 points) were reported in a previous paper. These findings are confirmed in this paper by an experiment showing that, for the same number of bits, BCA features increase the rate of recognition twice with respect to features obtained by simple decimation. However, the actual method is limited by the fact that the number of referent templates increases with the number of fonts to be recognized. The purpose of this paper is to remove this limitation. The main part of the paper discusses the properties of the Hamming distance and how they can be used in the BCA algorithm to improve the efficiency of classification. The last section reports the results of an experiment showing the discrimination power of BCA features.
Disambiguation and spelling correction for a neural network based character recognition system
John M. Trenkle, Robert C. Vogt III
Various approaches have been proposed over the years for using contextual and linguistic information to improve the recognition rates of existing OCR systems. However, there is an intermediate level of information that is currently underutilized for this task: confidence measures derived from the recognition system. This paper describes a high-performance recognition system that utilizes identification of field type coupled with field-level disambiguation and a spell-correction algorithm to significantly improve raw recognition outputs. This paper details the implementation of a high-accuracy machine-print character recognition system based on backpropagation neural networks. The system makes use of neural net confidences at every stage to make decisions and improve overall performance. It employs disambiguation rules and a robust spell-correction algorithm to enhance recognition. These processing techniques have led to substantial improvements of recognition rates in large scale tests on images of postal addresses.
Degraded text recognition using word collocation
Tao Hong, Jonathan J. Hull
A relaxation-based algorithm is proposed that improves the performance of a text recognition technique by propagating the influence of word collocation statistics. Word collocation refers to the likelihood that two words co-occur within a fixed distance of one another. For example, in a story about water transportation, it is highly likely that the word `river' will occur within ten words on either side of the word `boat.' The proposed algorithm receives groups of visually similar decisions (called neighborhoods) for words in a running text that are computed by a word recognition algorithm. The position of decisions within the neighborhoods are modified based on how often they co-occur with decisions in the neighborhoods of other nearby words. This process is iterated a number of times effectively propagating the influence of the collocation statistics across an input text. This improves on a strictly local analysis by allowing for strong collocations to reinforce weak (but related) collocations elsewhere. An experimental analysis is discussed in which the algorithm is applied to improving text recognition results that are less than 60% correct. The correct rate is effectively improved to 90% or better in all cases.
Deferred interpretation of gray-scale saddle features for recognition of touching and broken characters
Jairo Rocha, William J. Sakoda, Jiangying Zhou, et al.
Interpretation of gaps and touching characters continues to challenge current OCR designs. We approach this and other difficult problems in character recognition by deferring decisions to a stage where a character-specific knowledge base can be applied to the problem. We show how to extract and interpret saddle ridge features, at locations where there is either a narrow gap or a thin stroke. Since the color of the ideal image at these points cannot be locally deduced reliably from the features, special treatment is needed. Mathematically a saddle ridge is one where, roughly, there are strong eigenvalues of the Hessian of the gray scale surface that are of opposite sign. The recognition module is based on the matching of subgraphs homomorphic to previously defined prototypes. It generates candidate matchings of groups of input features with each part of the prototype. In this context, each saddle ridge is decided to be a piece of a stroke or a separation between strokes. The quality of each grouping is measured by the cost of transformations carrying the candidate features into the prototype.
Arabic character recognition
May Allam
This paper presents a complete system for learning and recognizing Arabic characters. Arabic OCR faces technical problems not encountered in other languages such as cursiveness, overriding and overlapping of characters, multiple shapes per character and the presence of vowels above and below the characters. The proposed approach relies on the fact that the process of connecting Arabic characters to produce cursive writing tends to form a fictitious baseline. During preprocessing, contour analysis provides both component isolation and baseline location. In the feature extraction phase, the words are processed from right to left to generate a sequence of labels. Each label is one of a predetermined codebook that represents all possible bit distribution with respect to the baseline. At a certain position, which depends on the label context, a segmentation decision is taken. During training, a model is generated for each character. This model describes the probability of the occurrence of the labels at each vertical position. During recognition, the probability of the label observation sequence is computed and accumulated. The system has been tested on different typewritten, typeset fonts and diacriticized versions of both and the evaluation results are presented.
Context-driven text recognition by means of dictionary support
Josua Boon, Frank Hoenes, Majdi Ben Hadj Ali
This paper presents an alternative method for typed character recognition by way of the textual context. The approach here is word-oriented, and uses no a priori knowledge about typical appearance of characters. It leads back to an approach suggested by R. G. Casey where text recognition is considered as solving a substitution cipher, or cryptogram. Character images are considered only in order to distinguish or group (cluster) them. The recognition information used is provided by dictionaries. The overall procedure can be divided into three principle steps: (1) a ciphertext like symbolic representation of the text is generated. (2) in an initialization phase only a few but reliable word recognitions are striven for. The resulting partial symbol-character assignments are sufficient to initiate the following relaxation of the recognition process as the third step. Whereas Casey uses several ambiguous alternatives for word recognition, the approach here is based on acquiring a few, but reliable, recognition alternatives. Thus, instead of a spell check program, a dictionary with a heuristic-driven look- up control combined with an appropriate access mechanism is used.
Recognition of faxed documents
Greg Ricker, Adam S. Winkler
This paper discusses the processing of faxed documents. An example of the type of documents we are concerned with are order entry forms. The system is designed to receive order forms via fax, identify the form, extract the appropriate data and present the data to a host computer. The types of data recognized are hand printed characters, machine printed characters, and marksense fields. The system is model based. A model of each is created once from the original image and stored in a form model data base. The processing of the form consists of four steps: object extraction, form identification, form registration, data extraction (including character recognition). Once all of the data has been extracted it is sent for post processing where the data is corrected for errors in the recognition phase with respect to contextual dependencies, logistics, and application specific dictionaries and then to the host computer where the order is confirmed. Presently the system is implemented in C++ under OS/2 and is in commercial use.
Beyond Pure Character Recognition
icon_mobile_dropdown
Extraction of object lines in engineering drawings
Chan Pyng Lai, Rangachar Kasturi
Classification of object lines in mechanical part drawings is a critical problem for automated conversion of drawings from paper medium to CAD databases. We describe new methods for classifying object lines. These methods include section line detection, hidden line detection, centerline detection and object line extraction. A self-supervised approach which includes a spacing estimation step and a recognition step to extract section lines is described. A general purpose algorithm that not only detects dashed lines but also classifies them based on their attributes is described. These attributes are used for classification of detected dashed lines as hidden lines or centerlines. Object line extraction facilitates intelligent interpretation of geometric objects for integration with CAD/CAM systems.
Recognition of Special or Poorly Printed Characters
icon_mobile_dropdown
Simulation study for different moment sets
Analysis of variance for coefficient of variation is performed under various noise levels and moment orders. As a measure for response stability for features in a pattern recognition system, the coefficient of variation is analyzed and compared among various moment sets, such as, the geometric moments, Legendre moments, complex moments, rotational moments and Zernike moments. A convenient definition for binary segmented images is introduced in order to quantify the level of the noise in the noisy patterns. Traditional two-way table data analysis is carried out to fit the nearly additively structured data and the analysis of variance is done on the fittings (row and column effects). A simple summary for the fittings is displayed by boxplots to show the distribution of effects due to the various noise levels and the moment orders. Among the different moment sets, the Zernike moments are shown to be optimal in the sense that Zernike moments are reliable features with the property of least varying response under various noise and orders.