Share Email Print

Proceedings Paper

Representing lexicons by modified trie for fast partial-string matching
Author(s): Stephen W. Lam; Xin Shen; Sheila X. Zhao; Sargur N. Srihari
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

A fast lexicon search based on trie is presented. Traditionally, a successful trie retrieval requires each element of the input string to find an exact match on a particular path in the trie. However, this approach cannot verify a garbled input string, which is generated due to imperfect character segmentation and recognition of an OCR. The string may contain multiple candidates for a character position or no candidate due to segmentation error. The proposed representation scheme, an extension of trie, allows lexicon look-up even with only some of the string elements specified. An input string is presented as a regular expression and all the paths in the trie that satisfy the expression will be considered as word candidates. The candidates can then be ranked based on character classifier decisions. Confusion in character recognition can be handled by allowing an expression component to have more than one character candidate while segmentation error can be overcome by postulating a word region to contain a certain number of unknown characters. Three extensions have been made to trie data structures for position independent access and fast exhaustive search of all valid paths: (1) bidirectional linkage between two nodes at adjacent levels to enable trie traversal in either direction, (2) the nodes with the same letter at the same word position are linked so that all the words which have the same letter at a specific position can be located immediately, and (3) an index table which records the entry points of letters at specific word positions in the trie in order to facilitate trie access at an arbitrary letter position. This representation scheme has been tested on postal address field recognition. The trie represents 11,157 words, the average access time is 0.02 sec on a SUN SPARC2 and with an average of 3 candidates.

Paper Details

Date Published: 14 April 1993
PDF: 9 pages
Proc. SPIE 1906, Character Recognition Technologies, (14 April 1993); doi: 10.1117/12.143624
Show Author Affiliations
Stephen W. Lam, SUNY/Buffalo (United States)
Xin Shen, SUNY/Buffalo (United States)
Sheila X. Zhao, SUNY/Buffalo (United States)
Sargur N. Srihari, SUNY/Buffalo (United States)

Published in SPIE Proceedings Vol. 1906:
Character Recognition Technologies
Donald P. D'Amato, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?