Share Email Print

Proceedings Paper

Matrix frequency analysis and its applications to language classification of textual data for English and Hebrew
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

The advent of the internet has opened a host of new and exciting questions in the science and mathematics of information organization and data mining. In particular, a highly ambitious promise of the internet is to bring the bulk of human knowledge to everyone with access to a computer network, providing a democratic medium for sharing and communicating knowledge regardless of the language of the communication. The development of sharing and communication of knowledge via transfer of digital files is the first crucial achievement in this direction. Nonetheless, available solutions to numerous ancillary problems remain far from satisfactory. Among such outstanding problems are the first few fundamental questions that have been responsible for the emergence and rapid growth of the new field of Knowledge Engineering, namely, classification of forms of data, their effective organization, and extraction of knowledge from massive distributed data sets, and the design of fast effective search engines. The precision of machine learning algorithms in classification and recognition of image data (e.g. those scanned from books and other printed documents) are still far from human performance and speed in similar tasks. Discriminating the many forms of ASCII data from each other is not as difficult in view of the emerging universal standards for file-format. Nonetheless, most of the past and relatively recent human knowledge is yet to be transformed and saved in such machine readable formats. In particular, an outstanding problem in knowledge engineering is the problem of organization and management--with precision comparable to human performance--of knowledge in the form of images of documents that broadly belong to either text, image or a blend of both. It was shown in that the effectiveness of OCR was intertwined with the success of language and font recognition.

Paper Details

Date Published: 30 January 2003
PDF: 12 pages
Proc. SPIE 4793, Mathematics of Data/Image Coding, Compression, and Encryption V, with Applications, (30 January 2003); doi: 10.1117/12.454831
Show Author Affiliations
Joseph Henry Uchill, Univ. of Wisconsin-Madison (United States)
Amir H. Assadi, Univ. of Wisconsin-Madison (United States)

Published in SPIE Proceedings Vol. 4793:
Mathematics of Data/Image Coding, Compression, and Encryption V, with Applications
Mark S. Schmalz, Editor(s)

© SPIE. Terms of Use
Back to Top