Science of Artificial Neural Networks

Volume Details

Date Published: 1 July 1992

Contents: 9 Sessions, 83 Papers, 0 Presentations

Conference: Aerospace Sensing 1992

Volume Number: 1710

All links to SPIE Proceedings will open in the SPIE Digital Library.

Show all abstracts

View Session

Plenary Session
Architectures
Biologically Based Systems
Perceptron-Based Systems
Recurrent Neural Networks
Fuzzy Logic and Genetic Algorithms I
Fuzzy Logic and Genetic Algorithms II
Network Analysis
Self-Organization (Poster Session)
Architectures
Network Analysis
Fuzzy Logic and Genetic Algorithms I
Network Analysis

Plenary Session

Computational learning theory

Raghu Raghavan

Show abstract

A summary of the Vapnik-Chervonenkis theory of learning, originally inspired by problems in pattern recognition, is presented.

Architectures

High-order neural network employing adaptive architecture

Ronald Michaels

Show abstract

For the two category classification problem a method of creating an adaptive architecture network (AANET) is presented and discussed. The principal means of adaptation of this network is the modification of its architecture. AANET is constructed using the repeated application of the outer product expansion, the Karhunen-Loeve expansion, and the Ho- Kashyap algorithm. A multilayer AANET may then be transformed into an equivalent single layer network by passing a vector x having symbolic terms through the network.

Inheritance of knowledge in neural networks via symbolic algebra

Samir I. Sayegh M.D.

Show abstract

The theme is that of `inheritance' and the use of symbolic algebra to implement it. The notion of inheriting knowledge between networks is crucial since training a network can be exceedingly slow. Partial information acquired through long experience (learning epochs) should be `transferrable.' A technique using the notion of `gluing' networks has been pioneered by Alex Waibel of CMU. However, this technique cannot be considered true inheritance since the component networks are trained on similar but different subtasks that are later `concatenated.' The approach presented here is based on the observation that even when the problem is not separable, one can get a reasonable performance out of a 2 layer network. Its training is fast and its only minimum is often below that of many of the local minima of the corresponding multi-layer networks. After training such a net, if one can transfer the knowledge to a multi-layer net, not only would one have saved valuable training time, but one would have avoided many of the local minima associated with the multi-layer network. Training can then proceed with the task of the multi-layer net reduced to improving the performance of the 2-layer net, instead of having to start from scratch. Equations corresponding to this approach are derived. They can be written for specific topologies and solved exactly (toys) or approximately (larger problems) using symbolic algebra.

Deterministic learning theory and a parallel cascaded one-step learning machine

Chia-Lun John Hu

Show abstract

For a one-layered hard-limited perceptron, it is well known that if the training set given is not linearly separable in the state space, the machine just cannot learn no matter what learning method we use. This separability property is generally studied from a geometrical point of view. This paper reports the derivation of an algebraic criterion of the separability of a given mapping set. Then a one-step learning method is derived which will either instruct the machine to find the required weight matrix in one non-iterative step, or inform the teacher that the given mapping set is inseparable or not learnable no matter what learning rule is used. A parallelly cascaded two- layered perceptron is then derived which may surpass all these learning difficulties.

Stretch and hammer neural networks

Steven C. Gustafson, Gordon R. Little, Mark August Manzardo, et al.

Show abstract

Stretch and hammer neural networks use radial basis function methods to achieve advantages in generalizing training examples. These advantages include (1) exact learning, (2) maximally smooth modeling of Gaussian deviations from linear relationships, (3) identical outputs for arbitrary linear combination of inputs, and (4) training without adjustable parameters in a predeterminable number of steps. Stretch and hammer neural networks are feedforward architectures that have separate hidden neuron layers for stretching and hammering in accordance with an easily visualized physical model. Training consists of (1) transforming the inputs to principal component coordinates, (2) finding the least squares hyperplane through the training points, (3) finding the Gaussian radial basis function variances at the column diagonal dominance limit, and (4) finding the Gaussian radial basis function coefficients. The Gaussian radial basis function variances are chosen to be as large as possible consistent with maintaining diagonal dominance for the simultaneous linear equations that must be solved to obtain the basis function coefficients. This choice insures that training example generalization is maximally smooth consistent with unique training in a predeterminable number of steps. Stretch and hammer neural networks have been used successfully in several practical applications.

Uniformly sparse neural networks

Siamack Haghighi

Show abstract

Application of neural networks to problems with a large number of sensory inputs is severely limited when the processing elements (PEs) need to be fully connected. This paper presents a new network model in which a trade off between the number of connections to a node and the number of processing layers can be made. This trade off is an important issue in the VLSI implementation of neural networks. The performance and capability of a hierarchical pyramidal network architecture of limited fan-in PE layers is analyzed. Analysis of this architecture requires the development of a new learning rule, since each PE has access to limited information about the entire network input. A spatially local unsupervised training rule is developed in which each PE optimizes the fraction of its output variance contributed by input correlations, resulting in PEs behaving as adaptive local correlation detectors. It is also shown that the output of a PE optimally represents the mutual information among the inputs to that PE. Applications of the developed model in image compression and motion detection are presented.

Visual grammars and their neural networks

Eric Mjolsness

Show abstract

We exhibit a systematic way to derive neural nets for vision problems. It involves formulating a vision problem as Bayesian inference or decision on a comprehensive model of the visual domain given by a probabilistic grammar. A key feature of this grammar is the way in which it eliminates model information, such as object labels, as it produces an image; correspondence problems and other noise removal tasks result. The neural nets that arise most directly are generalized assignment networks. Also there are transformations which naturally yield improved algorithms such as correlation matching in scale space and the Frameville neural nets for high-level vision. Networks derived this way generally have objective functions with spurious local minima; such minima may commonly be avoided by dynamics that include deterministic annealing, for example recent improvements to Mean Field Theory dynamics. The grammatical method of neural net design allows domain knowledge to enter from all levels of the grammar, including `abstract' levels remote from the final image data, and may permit new kinds of learning as well.

Neural networks as components

Guy Smith, J. Austin

Show abstract

This paper promotes and examines the use of neural networks as components. The properties of components desirable for constructing large flexible systems are listed. Some examples of such systems are examined. Neural networks are shown to have these desirable properties. Those characteristics of individual neural network algorithms relevant to system design are described and then examined for some well-known algorithms. The process of designing a neural network system for a particular task is given as an example. Finally, directions for future research are given.

Net pruning of multilayer perceptron using sequential classification technique

Kou-Yuan Huang, Hsiang-Tsun Yen

Show abstract

With the capabilities of parallel computation, distributed processing, and fault tolerance neural networks are employed widely in a number of research fields. Among the models of neural networks the single-layer perceptron and the multi-layer perceptron are the most popular ones used in supervised learning problems. However, there exists the redundant nodes that are insignificant for classification no matter which one of the two networks is trained to be a classifier. Although a net of a larger size usually has a faster learning rate, it results in an increase of forward computation complexity in either pattern recognizing or system relearning. In this paper, a new sequential classification model based on neural network is proposed. The model which combines the advantages of neural networks with the properties of the sequential classification is shown to have an encouraging performance for net pruning and feature reduction. In the experiments, two-class and m-class (m > 2) problems are implemented to prove the practicability of the new technique with a balance between the accuracy of pattern classification and the size of networks. In the conclusion, an overall discussion of the proposed model and technical comparisons with previous related research issues on net pruning are given.

Modeling neural networks by networks of finite automata

Christel Kemke

Show abstract

This paper describes a formalism for modeling neural networks based on networks of finite automata. We assume the behavior of a network, i.e., its reaction (output) to an external stimulus (input), to be represented uniquely by the spatio-temporal, dynamic activation process which occurs in the network caused by the external stimulus. For a restricted class of activation processes, we are able to determine the resultant activation process caused by simultaneous or successive stimuli from the activation processes representing the single stimuli. Thus, the reaction of the system to a complex input, consisting of a set of simultaneous or successive stimuli, can be inferred from its reactions to the single stimuli. The model was used for the construction and simulation of small networks demonstrating learning and regulating features as well as for the investigation of the behavior of large neural assemblies, which is the main topic of this paper.

Structural organization of a Boolean cellular automata

Valentin Dragoi

Show abstract

This paper deals with the notion of structural organization of Boolean cellular automata as an evolutive concept driving the cellular state space grouping, as nearly as possible, of the desired final distribution. In this respect, there is assumed the introduction of a set of specific categories as the union of structural state combinations. The transition rule is applied to each neighborhood if the structural organizational degree, estimated as the chance of evolution toward the desired category, overtakes an empirical threshold. The convergence of the cellular automata organizational process is discussed by studying the Lyapunov function of the synchronous iteration trajectory which drives to the stable configuration. This function is based on the informational entropy conveyed by each neighborhood, as a performance criterion. The category weights are evaluated, i.e., the strengths of the encouragement character of certain categories during the evolutionary process. Finally, some possible applications in the field of image processing are listed.

Nonstationary and asymmetric net for real-time pattern recognition in noisy environments

Gianfranco Basti, Antonio Luigi Perrone, Eros Pasero

Show abstract

After a discussion of some theoretical limitations and their experimental demonstration of multilayer architectures in contextual pattern recognition, we propose an implementation of a spin-glass like neural net designed to deal efficiently in real time with time-dependent inputs (pattern translations, rotations, scaling, deformations) in noisy environments. The basic idea is a double dynamic on activations and weights on the same time scale. The two dynamics are correlated through an STM locking function on the object. This locking is the means by which the LTM module of the net can perform an invariant recognition of the object under transformations. This is possible owing to the invariant extraction of global features. The net is non-stationary and asymmetrical, because it is able to choose the right correlation order regarding the memorized prototypes for a successful recognition. Nevertheless, the same non- stationary condition, depending on the locking on an object under transformations, implies that the net displays a non-relaxing stabilization. It is presented as an application of the model to the classical recognition problem of rotation `T' and `C' pattern sequences in different noisy contexts.

Biologically Based Systems

Physicochemical analog for modeling superimposed and coded memories

Minas Ensanian

Show abstract

The mammalian brain is distinguished by a life-time of memories being stored within the same general region of physicochemical space, and having two extraordinary features. First, memories to varying degrees are superimposed, as well as coded. Second, instantaneous recall of past events can often be affected by relatively simple, and seemingly unrelated sensory clues. For the purposes of attempting to mathematically model such complex behavior, and for gaining additional insights, it would be highly advantageous to be able to simulate or mimic similar behavior in a nonbiological entity where some analogical parameters of interest can reasonably be controlled. It has recently been discovered that in nonlinear accumulative metal fatigue memories (related to mechanical deformation) can be superimposed and coded in the crystal lattice, and that memory, that is, the total number of stress cycles can be recalled (determined) by scanning not the surfaces but the `edges' of the objects. The new scanning technique known as electrotopography (ETG) now makes the state space modeling of metallic networks possible. The author provides an overview of the new field and outlines the areas that are of immediate interest to the science of artificial neural networks.

Dynamically stable associative learning: a neurobiologically based ANN and its applications

Thomas P. Vogl, Kim L. Blackwell, Garth Barbour, et al.

Show abstract

Most currently popular artificial neural networks (ANN) are based on conceptions of neuronal properties that date back to the 1940s and 50s, i.e., to the ideas of McCullough, Pitts, and Hebb. Dystal is an ANN based on current knowledge of neurobiology at the cellular and subcellular level. Networks based on these neurobiological insights exhibit the following advantageous properties: (1) A theoretical storage capacity of b^N non-orthogonal memories, where N is the number of output neurons sharing common inputs and b is the number of distinguishable (gray shade) levels. (2) The ability to learn, store, and recall associations among noisy, arbitrary patterns. (3) A local synaptic learning rule (learning depends neither on the output of the post-synaptic neuron nor on a global error term), some of whose consequences are: (4) Feed-forward, lateral, and feed-back connections (as well as time-sensitive connections) are possible without alteration of the learning algorithm; (5) Storage allocation (patch creation) proceeds dynamically as associations are learned (self- organizing); (6) The number of training set presentations required for learning is small (< 10) and does not change with pattern size or content; and (7) The network exhibits monotonic convergence, reaching equilibrium (fully trained) values without oscillating. The performance of Dystal on pattern completion tasks such as faces with different expressions and/or corrupted by noise, and on reading hand-written digits (98% accuracy) and hand-printed Japanese Kanji (90% accuracy) is demonstrated.

Some neural correlates of sensorial and cognitive control of behavior

Haluk Ogmen, R. V. Prakash, M. Moussa

Show abstract

Development and maintenance of unsupervised intelligent activity relies on an active interaction with the environment. Such active exploratory behavior plays an essential role in both the development and adult phases of higher biological systems including humans. Exploration initiates a self-organization process whereby a coherent fusion of different sensory and motor modalities can be achieved (sensory-motor development) and maintained (adult rearrangement). In addition, the development of intelligence depends critically on an active manipulation of the environment. These observations are in sharp contrast with current attempts of artificial intelligence and various neural network models. In this paper, we present a neural network model that combines internal drives and environmental cues to reach behavioral decisions for the exploratory activity. The vision system consists of an ambient and a focal system. The ambient vision system guides eye movements by using nonassociative learning. This sensory based attentional focusing is augmented by a `cognitive' system using models developed for various aspects of frontal lobe function. The combined system has nonassociative learning, reinforcement learning, selective attention, habit formation, and flexible criterion categorization properties.

Step-by-step design of a gain-adjustable neuron cell

Chiewcharn Narathong, J. Staab, S. Geiger

Show abstract

The objective of this research project is to develop an optimally sized analog CMOS neuron cell library. This neuron cell together with the optimal synapse developed previously can be used to construct a high density VLSI neural network. The standard cell library allows a neural network designer to concentrate on applying his/her networks as well as evaluating learning algorithms. In order to achieve this objective, a set of optimal design equations was derived from standard CMOS equations. This paper includes both the derivation of the design equations and the evaluation of the size (silicon area) and performance of the neuron cell.

Redundancy reduction as the basis for visual signal processing

A. Norman Redlich

Show abstract

An environmentally driven, self-organizing principle for encoding sensory messages is proposed, based on the need to learn their statistical properties. Optimal encodings are found for two cases: First, for linear maps the optimal transformation eliminates pairwise correlations between input `pixels.' This solution is applied to predict the retinal transform based on the autocorrelator for natural scenes. Second, when the input `images' consist of a set of weakly coupled, local `bound states,' then a series of non-linear maps is found which optimally segments the input. This is demonstrated by using it to efficiently learn, without supervision, the statistics of English text.

Neural network modeling of visual recognition

Rafik Braham

Show abstract

The recognition of visual patterns is one of the main application areas of neural networks. Several models have been designed based on the current understanding of visual information processing in the brains of cats and monkeys. Examples of such models are described in the works of Fukushima, Grossberg, von der Malsburg, and others. But because the visual system is very complex and visual information processing consists of several stages, the technical models attempt to reproduce one or a few aspects. The author has been mostly interested in modeling some of the anatomical features of visual areas and understanding their functional significance. In this paper, principles used in popular models are analyzed. Then the structure and design rationale of a vision model is described. In this description, the principles of the model rather than its implementation details are underscored.

Optimal spatial distribution of photodetector array using information theory

Eyal Shekel

Show abstract

Information theory is used to predict an optimal spatial distribution of a given number of photodetectors. We compare our results with the known distribution in the human eye. The optimization takes into account eye movement which leads to different optimal arrays depending on the time scales of the visual information. When the visual data contains mixed time scales, maximum information flow is achieved by an array distribution consisting of both a large uniform low resolution region, and a smaller high resolution region, as in the human retina. Optimal ratios of areas and densities of these two regions are calculated as a function of the number of eye movements. The results lend support to the hypothesis that the retina is an information theoretically optimal processor.

Neural nets for massively parallel optimization

Laurence C. W. Dixon, David Mills

Show abstract

To apply massively parallel processing systems to the solution of large scale optimization problems it is desirable to be able to evaluate any function f(z), z (epsilon) Rⁿ in a parallel manner. The theorem of Cybenko, Hecht Nielsen, Hornik, Stinchcombe and White, and Funahasi shows that this can be achieved by a neural network with one hidden layer. In this paper we address the problem of the number of nodes required in the layer to achieve a given accuracy in the function and gradient values at all points within a given n dimensional interval. The type of activation function needed to obtain nonsingular Hessian matrices is described and a strategy for obtaining accurate minimal networks presented.

Solving linear hard-optimization problems

Hua Li, Yuan Dong Ji

Show abstract

In this paper, we address the linear hard optimization problems with the emphasis on two- point boundary value conditions which is referred to as two-point boundary value problem (TPBVP). We propose two different neural networks for solving a class of linear TPBVPs. We show that the proposed networks can solve linear TPBVPs. We also provide experimental results.

Parametric and additive perturbations for global optimization

James Ting-Ho Lo

Show abstract

A new iterative approach to global minimization is proposed. In each iteration, the approach rocks the `landscape' of the objective function and rolls the ball representing the current state of the variable down to the bottom of a nearby `valley.' In the process of lowering the rock level, some critical rock levels are sufficient to rock the ball out of the attraction region of a strictly local minimum, but insufficient to rock it out of that of a global minimum. If these critical rock levels are maintained long enough, the attraction region of a global minimum is expected to be reached. When the rock stops, the ball rolls right into the global minimum. The approach applies to both continuous and combinatorial optimization. Rock is performed by either perturbing the constants of the objective function, adding a perturbing function to it, or both. Roll is carried out by any local minimization method. Although some initial numerical results are encouraging, a systematic way to schedule the rock level lowering, which guarantees convergence to a global minimum, is yet to be discovered. To demonstrate the application of the rock and roll approach under the assumption that a rock schedule is available, we show in the last two sections, how the backpropagation training algorithm can be rocked to produce a globally optimal multilayer perceptron and how the Hopfield net can be rocked to produce a combinatorially minimal solution.

Neural network analysis of the information content in population responses from human periodontal receptors

Benoni B. Edin, Mats Trulsson

Show abstract

Understanding of the information processing in some sensory systems is hampered for several reasons. First, some of these systems may depend on several receptor types with different characteristics, and the crucial features of natural stimuli encoded by the receptors are rarely known with certainty. Second, the functional output of sensory processing is often not well defined. The human tooth is endowed with several types of sensory receptors. Among these, the mechanoreceptors located in the periodontal ligaments have been implicated in force encoding during chewing and biting. Individual receptors cannot, however, code unambiguously either the direction or the magnitude of the applied forces. Neuronal responses recorded in single human nerve fibers from periodontal receptors were fed to multi-layered feed-forward networks. The networks were trained with error back-propagation to identify specific features of the force stimuli that evoked the receptor responses. It was demonstrated that population responses in periodontal receptors contain information about both the point of attack and the direction of applied forces. It is concluded that networks may provide a powerful tool to investigate the information content in responses from biological receptor populations. As such, specific hypotheses with respect to information processing may be tested using neural networks also in sensory systems less well understood than, for instance, the visual system.

Boltzmann distributions and neural networks: models of unbalanced interpretations of reversible patterns

Francesco Masulli, Massimo Riani, Enrico Simonotto, et al.

Show abstract

The paper describes a neural network model of the perceptual alternation of ambiguous patterns with unbalanced alternative interpretations. The network is made up by binary `neurons' fully and symmetrically interconnected. An energy function can be introduced; therefore, the analogy between the presented model and magnetic systems is exploited to study the statistical properties of the system. On the basis of considerations related to statistical mechanics, the probabilities of `occupation' of the two phase-space regions, associated with the two interpretations of an ambiguous figure, can be determined and analyzed.zed

Phase transitions in oscillatory neural networks

Hirofumi Nagashino, J. A. Scott Kelso

Show abstract

We have constructed and analyzed a theoretical model of two coupled neural oscillator networks aimed at understanding the underlying basis of phase transitions in biological coordination of rhythmic activities. Each oscillator unit is composed of an excitatory and an inhibitory neuron. These two neurons are coupled to each other forming a negative feedback loop. The excitatory neuron has a self-excitatory connection forming a positive feedback loop. We assume that the change of the coupling strength of the oscillator units or the neurons in each oscillator effects a change in the frequency of the rhythm. We find two, coexisting stable phase-locked modes (in-phase and anti-phase) over a region of coefficients. However, at a critical coupling value, the anti-phase mode becomes unstable and a transition to the in-phase mode occurs. Poincare's method is employed to elucidate bifurcations of the oscillatory solutions, thus revealing the full phase portrait of the network dynamics. The influence of noise on the stability of mode-locked states is also analyzed and correspondence with experimental results is demonstrated.

Perceptron-Based Systems

Nonseparable data models for a single-layer perceptron

John J. Shynk, Neil J. Bershad

Show abstract

This paper describes two nonseparable data models that can be used to study the convergence properties of perceptron learning algorithms. A system identification formulation generates the training signal, with an input that is a zero-mean Gaussian random vector. One model is based on a two-layer perceptron configuration, while the second model has only one layer but with a multiplicative output node. The analysis in this paper focuses on Rosenblatt's training procedure, although the approach can be applied to other learning algorithms. Some examples of the performance surfaces are presented to illustrate possible convergence points of the algorithm for both nonseparable data models.

Feedforward networks with hierarchical structure and local learning

Ernest Robert McCurley, Mark T. Miller

Show abstract

A specialized multilayer perceptron architecture called the adaptive neighborhood network is presented. Geometric and statistical information about pattern distributions replace traditional heuristics for network structure and initialization. As a result, adaptive neighborhood networks train quickly and simply, and perform well in certain classification applications.

Accessing the null space with nonlinear multilayer neural networks

Shelly D. D. Goggin, Karl E. Gustafson, Kristina M. Johnson

Show abstract

Nonlinear multilayer neural networks have been successful in solving problems in object recognition and decision making, which cannot be solved with nonlinear decision functions. These problems require the construction of a one-to-one or many-to-one mapping of input vectors to output vectors. If a finite training set is used, this mapping is a transformation between the set of values for the input elements to the set of values for the output elements. If the set of values for the input elements lies in the same subspace as the set of values for the output vectors, then a linear transformation can be made. Otherwise, either a neural network or some other nonlinear function is needed to construct the transformation. Nonlinear decision functions can make the transformation to sets of values for the output elements that are in a different subspace, but not an arbitrary subspace. The nonlinear multilayer neural network can make any transformation between sets of values of input elements and output elements, if enough hidden units are available. An explanation for the neural network's power to access any space, including the null space, is presented, along with some examples of the applicability of this result.

Recent advances on techniques and theories of feedforward networks with supervised learning

Lei Xu, Stan Klasa

Show abstract

The rediscovery and popularization of the back propagation training technique for multilayer perceptrons as well as the invention of the Boltzmann Machine learning algorithm has given a new boost to the study of supervised learning networks. In recent years, besides the widely spread applications and the various further improvements of the classical back propagation technique, many new supervised learning models, techniques as well as theories, have also been proposed in a vast number of publications. This paper tries to give a rather systematical review on the recent advances on supervised learning techniques and theories for static feedforward networks. We summarize a great number of developments into four aspects: (1) Various improvements and variants made on the classical back propagation techniques for multilayer (static) perceptron nets, for speeding up training, avoiding local minima, increasing the generalization ability, as well as for many other interesting purposes. (2) A number of other learning methods for training multilayer (static) perceptron, such as derivative estimation by perturbation, direct weight update by perturbation, genetic algorithms, recursive least square estimate and extended Kalman filter, linear programming, the policy of fixing one layer while updating another, constructing networks by converting decision tree classifiers, and others. (3) Various other feedforward models which are also able to implement function approximation, probability density estimation and classification, including various models of basis function expansion (e.g., radial basis functions, restricted coulomb energy, multivariate adaptive regression splines, trigonometric and polynomial bases, projection pursuit, basis function tree, and may others), and several other supervised learning models. (4) Models with complex structures, e.g., modular architecture, hierarchy architecture, and others. (5) A number of theoretical issues involving the universal approximation of continuous functions, best approximation ability, learnability, capability, generalization ability, and the relations between these abilities to the number of layers in a network, the number of the neurons needed, hidden neurons, as well as the number of training samples. Altogether, we try to give a global picture of the present state of supervised learning techniques and theories for training static feedforward networks.

Design rules of multilayer perceptrons

Youngjik I. Lee, San-Hoon Oh, Hyun Kyung Song, et al.

Show abstract

Multilayer perceptrons with the error back-propagation learning algorithm are widely used for many pattern classification applications. In this paper, we address some design rules of a multilayer perceptron related to its learning speed and selection of an optimal number of hidden nodes. One of the critical drawbacks of the error back-propagation learning algorithm is its slow learning speed. We have analyzed the reasons for this drawback, and suggested that fast learning can be achieved with proper initial weight settings. Another important problem for multilayer networks is to determine an optimal number of hidden nodes. By analyzing the total error of a multilayer perceptron, we propose an efficient method which yields to an appropriate number of hidden nodes by iteratively eliminating unnecessary hidden nodes. These design rules have been successfully applied to the handwritten digit recognition problem.

Feedforward neural nets and one-dimensional representation

Laurence C. W. Dixon, David Mills

Show abstract

Feedforward nets can be trained to represent any continuous function, and training is equivalent to solving a nonlinear optimization problem. Unfortunately, it frequently leads to an error function with a Hessian matrix that is effectively singular at the solution. Traditional quadratic based optimization algorithms do not perform superlinearly on functions with a singular Hessian, but results on univariate functions show that even so they are more efficient and reliable than backpropagation. A feedforward net is used to represent a superposition of its own sigmoid activation function. The results identify some conditions for which the Hessian of the error function is effectively singular.

Fault tolerance of neural networks with noisy training sets

Jay I. Minnix

Show abstract

It is well established that training backpropagation networks with noisy training sets increases the generalization capabilities of the network. Since input set noise is somewhat analogous to faults in the network, networks trained on noisy inputs should exhibit fault tolerance superior to that of similar networks trained on non-noisy inputs. This paper presents results of a study to determine the effect of noisy training sets on fault tolerance. Backpropagation was used to train three sets of networks on 7 X 7 numeral patterns. One set was the control and used noiseless inputs and the other two used two different noisy cases. Several network examples were trained for each of the three cases (no noise, 10% noise, and 20% noise). The noise was injected into each training image uniformly at random, and took the form of toggled (0 to 1 and 1 to 0) pixel values in the binary input images. After learning was complete, the networks were tested for their fault tolerance to stuck-at-1 and stuck-at-0 element faults, as well as weight connection faults. The networks trained on noisy inputs had substantially better fault tolerance than the network trained on noiseless inputs.

Alternative method for the connectionist learning of k-DNF expressions

Thomas Bitterman

Show abstract

We show that the extremely long learning times imposed upon connectionist systems by the use of general learning methods such as back propagation and simulated annealing can be drastically shortened by the use of methods geared toward the problem at hand. In particular, the learning of k-DNF expressions is analyzed and a new, more efficient, algorithm is proposed.

Negative transfer problem in neural networks

Adel M. Abunawass

Show abstract

Harlow, 1949, observed that when human subjects were trained to perform simple discrimination tasks over a sequence of successive training sessions (trials), their performance improved as a function of the successive sessions. Harlow called this phenomena `learning-to- learn.' The subjects acquired knowledge and improved their ability to learn in future training sessions. It seems that previous training sessions contribute positively to the current one. Abunawass & Maki, 1989, observed that when a neural network (using the back-propagation model) is trained over successive sessions, the performance and learning ability of the network degrade as a function of the training sessions. In some cases this leads to a complete paralysis of the network. Abunawass & Maki called this phenomena the `negative transfer' problem, since previous training sessions contribute negatively to the current one. The effect of the negative transfer problem is in clear contradiction to that reported by Harlow on human subjects. Since the ability to model human cognition and learning is one of the most important goals (and claims) of neural networks, the negative transfer problem represents a clear limitation to this ability. This paper describes a new neural network sequential learning model known as Adaptive Memory Consolidation. In this model the network uses its past learning experience to enhance its future learning ability. Adaptive Memory Consolidation has led to the elimination and reversal of the effect of the negative transfer problem. Thus producing a `positive transfer' effect similar to Harlow's learning-to-learn phenomena.

Principal component training of multilayer perceptron neural networks

Gwong Chain Sun, Darrel L. Chenoweth

Show abstract

This paper addresses the problem of training a multi-layer perceptron neural network for use in statistical pattern recognition applications. In particular it suggests a method for training such a network which significantly reduces the number of iterations that usually accompanies the use of the back propagation learning algorithm. The use of principal component analysis is proposed, and an example is given that demonstrates significant improvements in convergence speed as well as the number of hidden layer neurons needed, while maintaining accuracy comparable to that of a conventional perceptron network trained using back propagation. The accuracy obtained by the principal component trained network is also compared to that of a Bayes classifier used as a reference for evaluating accuracies. in addition, a cursory examination of the network performance with uniformly distributed feature classes is included. This work is still of a preliminary nature, but the initial examples we have considered suggest the method has promise for statistical classification applications.

Algorithm for the N-Lyapunov exponents of an N-dimensional unknown dynamical system

W. Davis Dechert, Ramazan Gencay

Show abstract

An algorithm for estimating the Lyapunov exponents of an unknown dynamical system is designed. The algorithm estimates not only the largest but all n-Lyapunov exponents of an n- dimensional system correctly. The estimation is carried out by multilayer feedforward networks. We focus our attention on deterministic as well as noisy system estimation. The performance of the algorithm is very satisfactory in the presence of noise as well as with limited number of observations.

Theoretical and experimental analysis of the first layer in neural networks for 3D pattern recognition

Jesus Figueroa-Nazuno, A. Vazquez-Nava, E. Vargas Medina

Show abstract

The behavior of the first layer of a weightless artificial neural network is analyzed in this paper. The way in which the neural network receives external information changes according to different probability distribution functions that control data sampling from many different patterns. This paper describes the architecture of this system, and shows the effect of the different probability distribution functions over 3-dimensional pattern recognition.

Capacity of feedforward networks with shared weights

Martin A. Kraaijveld, Robert P. W. Duin

Show abstract

In pattern recognition it is a well-known fact that the number of free parameters of a classification function should not be too large, since the parameters have to be estimated from a finite learning set. For multi-layer feedforward network classifiers, this implies that the number of weights and units should be limited. However, a fundamentally different approach to decrease the number of free parameters in such networks, suggested by Rumelhart and applied by le Cun, is by sharing the same weights with multiple units. This was motivated by the fact that translation invariance could be obtained by this technique. In this paper, we discuss how this weight sharing technique influences the capacity or Vapnik-Chervonenkis dimension of the network. First, an upper bound is derived for the number of dichotomies that can be induced with a layer of units with shared weights. Then, we apply this result to bound the capacity of a simple class of weight-sharing networks. The results show that the capacity of a network with shared weights is still linear in the number of free parameters. Another remarkable outcome is either that the weight sharing technique is a very effective way of decreasing the capacity of a network, or that the existing bounds for the capacity of multi- layer feedforward networks considerably overestimate the capacity.

Attentive multidirectional associative memory with application to pattern association

Heng-Ming Tai

Show abstract

An attentive multidirectional hetero-associative memory network (AMAM) is proposed. The convergence and encoding strategies of AMAM are described. This network enables multiple associations, but with certain associations embedding more attention. This model is inspired by speculation about how associative learning and storage might occur in the nervous system. AMAM has much better error correcting capability and memory capacity than the multidirectional associative memory. Examples are illustrated to show the advantages of this model. In addition, we demonstrate and compare its recall ability for pattern recognition.

FAUST: a vision-based neural network multimap pattern recognition architecture

Charles L. Wilson

Show abstract

A new architecture is presented for multi-map, self-organizing pattern recognition which allows concurrent massively parallel learning of features using different maps for each feature type. The method used is similar to the multi-map structures known to exist in the vertebrate sensory cortex. The learning used to update memory locations uses a feed-forward mechanism and is self-organizing. The architecture is described by the acronym FAUST (Feed-forward Association Using Symmetrical Triggering). As a demonstration of the effectiveness of FAUST, a character recognition, fingerprint classification, and forms recognition programs have been constructed on a massively parallel compute. The character recognition program can perform 99% accurate character recognition on medium-quality machine printed digits at a speed of 2.4 ms/digit, and 88% recognition on multiple-writer handprint with a 2.3% substitutional error rate. The form recognition program can achieve 94% accuracy on complex forms. The fingerprint classification program classifies 93% of fingerprints correctly with 10% rejection rate. All of the calculations were performed on an Active Memory Technology DAP 510.

Recurrent Neural Networks

Function prediction using recurrent neural networks

Randall L. Lindsey, Dennis W. Ruck, Steven K. Rogers, et al.

Show abstract

The real-time recurrent learning (RTRL) algorithm was modified and applied to the task of function prediction. This recurrent neural network was modified to include both a variable learning rate, and a linear output combined with sigmoidal hidden units. The simple learning rate modification allows faster network convergence while avoiding most cases of catastrophic divergence. In addition, a linear output combined with hidden sigmoidal units enables the network to predict unbounded functions. The modified recurrent network was then used to simulate a linear system (second order Butterworth filter). In addition, the recurrent network was applied to two specific applications: predicting 3-D head position in time, and voice data reconstruction. The accuracy at which the network predicted the pilot's head position was compared to the best linear statistical prediction algorithm. The application of the network to the reconstruction of voice data showed the recurrent network's ability to learn temporally encoded sequences, and make decisions as to whether or not a speech signal sample was considered a fricative or a voiced portion of speech.

Sequence learning with recurrent networks: analysis of internal representations

Joydeep Ghosh, Vijay Karamcheti

Show abstract

The recognition and learning of temporal sequences is fundamental to cognitive processing. Several recurrent networks attempt to encode past history through feedback connections from `context units.' However, the internal representations formed by these networks is not well understood. In this paper, we use cluster analysis to interpret the hidden unit encodings formed when a network with context units is trained to recognize strings from a finite state machine. If the number of hidden units is small, the network forms fuzzy representations of the underlying machine states. With more hidden units, different representations may evolve for alternative paths to the same state. Thus, appropriate network size is indicated by the complexity of the underlying finite state machine. The analysis of internal representations can be used for modeling of an unknown system based on observation of its output sequences.

Recurrent network training with the decoupled-extended-Kalman-filter algorithm

Gintaras V. Puskorius, Lee A. Feldkamp

Show abstract

In this paper we describe the extension of our decoupled extended Kalman filter (DEKF) training algorithm to networks with internal recurrent (or feedback) connections; we call the resulting algorithm dynamic DEKF (or DDEKF for short). Analysis of DDEKF's computational complexity and empirical evidence suggest significant computational and performance advantages in comparison to training algorithms based exclusively upon gradient descent. We demonstrate DDEKF's effectiveness by training networks with recurrent connections for four different classes of problems. First, DDEKF is used to train a recurrent network that produces as its output a delayed copy of its input. Second, recurrent networks are trained by DDEKF to recognize sequences of events with arbitrarily long time delays between the events. Third, DDEKF is applied to the training of identification networks to act as models of the input-output behavior for nonlinear dynamical systems. We conclude the paper with a brief discussion of the extension of DDEKF to the training of neural controllers with internal feedback connections.

Alternative learning methods for training neural network classifiers

Edward E. DeRouin, Joe R. Brown

Show abstract

Neural networks have proven very useful in the field of pattern classification by mapping input patterns into one of several categories. One widely used neural network paradigm is the multi- layer perceptron employing back-propagation of errors learning -- often called back- propagation networks (BPNs). Rather than being specifically programmed, BPNs `learn' this mapping by exposure to a training set, a collection of input pattern samples matched with their corresponding output classification. The proper construction of this training set is crucial to successful training of a BPN. One of the criteria to be met for proper construction of a training set is that each of the classes must be adequately represented. A class that is represented less often in the training data may not be learned as completely or correctly, impairing the network's discrimination ability. This is due to the implicit setting of a priori probabilities which results from unequal sample sizes. The degree of impairment is a function of (among other factors) the relative number of samples of each class used for training. This paper addresses the problem of unequal representation in training sets by proposing two alternative methods of learning. One adjusts the learning rate for each class to achieve user- specified goals. The other utilizes a genetic algorithm to set the connection weights with a fitness function based on these same goals. These methods are tested using both artificial and real-world training data.

Fuzzy Logic and Genetic Algorithms I

What can quantum logic and fuzzy logic teach each other?

H. John Caulfield, Luis R. Lopez

Show abstract

This work began with our consideration of the question: `Does fuzzy logic help us understand quantum mechanics?' In arriving at a positive answer, we had to posit a nondeterministic defuzzifying rule. We suggest here that such a rule may be useful in fuzzy operations, especially fuzzy control. It is often possible to construct an optical quantum computer which directly implements nondeterministic defuzzification by quantum indeterminacy. Thus our attempt to use fuzzy logic to understand quantum mechanics leads to a system which uses quantum mechanics to implement fuzzy logic.

Fuzzy neural computing systems

Madan M. Gupta

Show abstract

In this paper we give some basic principles of fuzzy neural computing using synaptic and somatic operations. We first briefly review the neural systems based upon conventional algebraic synaptic (confluence) and somatic (aggregation) operations. Then we provide a detailed neuronal morphology based upon fuzzy logic and its generalization in the form of T- operators. For such fuzzy logic based neurons we then develop the learning and adaptation algorithm.

Improving convergence and performance of Kohonen's self-organizing scheme

Nikhil R. Pal, James C. Bezdek, Eric C.K. Tsao

Show abstract

Kohonen-like clustering algorithms (e.g., learning vector quantization) suffer from several major problems. For this class of algorithms, output often depends on the initialization. If the initial values of the cluster centers are outside the convex hull of the input data, such an algorithm, even if it terminates, may not produce meaningful results in terms of prototypes for clustering. This is because it updates only the winner prototype with every input vector. In this paper we propose a generalization of learning vector quantization (which we shall call a Kohonen clustering network or KCN) which, unlike other methods, updates all the nodes with each input vector. Moreover, the network attempts to find a minimum of a well defined objective function. The learning rules depend on the degree of match to the winner node; the lesser the degree of match with the winner, the more is the impact on nonwinner nodes. Our numerical results show that the generated prototypes do not depend on the initialization, learning coefficient, or the number of iterations (provided KCN runs for at least 200 passes through the data). We use Anderson's IRIS data to illustrate our method; and we compare our results with the standard Kohonen approach.

Edge detection using a fuzzy neural network

David Alan Kerr, James C. Bezdek

Show abstract

We propose a method for training a standard feed forward, back propagation neural-like network using fuzzy label vectors whose performance goal is to produce edge images from standard imagery such as FLIR, video, and grey tone pictures. Our method is based on training the network on a basis set of edge windows which are scored using the Sobel operator. The method is illustrated by comparing edge images of several real scenes with those derived using the Sobel and Canny image operators.

Partially supervised fuzzy c-means algorithm for segmentation of MR images

Amine M. Bensaid, James C. Bezdek, Lawrence O. Hall, et al.

Show abstract

Partial supervision is introduced to the unsupervised fuzzy c-means algorithm (FCM). The resulting algorithm is called semi-supervised fuzzy c-means (SFCM). Labeled data are used as training information to improve FCM's performance. Training data are represented as training columns in SFCM's membership matrix (U), and are allowed to affect the cluster center computations. The degree of supervision is monitored by choosing the number of copies of the training set to be used in SFCM. Preliminary results of SFCM (applied to MRI segmentation) suggest that FCM finds the clusters of most interest to the user very accurately when training data is used to guide it.

Framework for fuzzy neural networks

Noaki Imasaki, Jun-ichi Kiji, Masahiko Arai

Show abstract

This paper proposes fuzzy inference neural network (FiNN) as a framework for an incorporated system involving fuzzy theory and neural network theory. The FiNN is structured on a skeleton of specified fuzzy rules so that the FiNN can store the fuzzy rules smoothly. A FiNN system implements approximate inference from the fuzzy rules. There are three types for the structured parts, which are called `antecedent network,' or `conclusion network,' or `logic network.' Each structured part is a neural network component. Each neural network component executes an elementary function which is a part of an approximate inference procedure. The FiNN categorizes practical data by itself to generate learning samples for the conclusion networks. Membership functions in the antecedent networks are initialized by a priori knowledge, and modified by solving inverse problems of the logic network. A numerical example clarifies the applicability to the system identification.

Modeling confusion for autonomous systems

James A. Stover, Ronald E. Gibson

Show abstract

Autonomous systems process sensory information to build representations of the external world, which serve as the basis for response decisions. These representations may be characterized by property lists, some of which are not direct sensor measurements, but inferred. Inferred properties are identified by classifiers or pattern recognition devices, which identify the existence of `fuzzy' concepts, such as `flying object.' Fuzzy classifiers assign properties with infinite degrees of existence, represented by numbers on the closed unit interval [0,1], which raises issues not present with classifiers based on binary logic, in which properties either exist or do not. One of these issues is confusion. A system is said to be in a state of confusion when it is generating similar confidence factors for mutually exclusive properties. For example, the fuzzy concepts `civilian' and `military' may be properties of the object class `aircraft,' and a state of confusion exists if confidence factors for these properties are both relatively close. An autonomous system that reacts to aircraft needs an explicit representation of confusion to enable it to decide whether it should react to object instances in their present form or continue data gathering. We discuss approaches to modeling confusion using fuzzy logical operators and present illustrative examples of its application in multi-level classification.

Realizing the potential of neural network implementation technologies: methods of using large neural networks

Colin Smith

Show abstract

In the longer term amorphous silicon and other technologies are promising to allow the implementation of large neural networks directly in hardware. However, although a number of application areas, such as telecommunication network management and image recognition, could benefit from such large artificial neural networks because they require large numbers of inputs and/or outputs, it is currently difficult to determine the benefits of large networks in such application areas for the following main reasons. Firstly, design application engineers do not yet have such tool available to them; secondly, and more importantly, many popular neural network training algorithms do not scale well. This paper suggests new methods of combining many small networks to produce a large composite system capable of extending the range of problems to which neural network techniques can be applied. It shows how large networks can be used to extend the ideas of fuzzy logic so that non-linear dependences between vectors can be dealt with.

Reinforcement and unsupervised learning in fuzzy-neuro controllers

Emdad Khan

Show abstract

Refinement of the performance of approximate reasoning based controllers (e.g., fuzzy logic based controllers) by using reinforcement (also known as graded) learning have been proposed recently. However, reinforcement learning schemes known today have problems in learning and generating proper control inputs, especially, for complex plants. In this paper, we have presented novel schemes to alleviate these problems found in the existing reinforcement learning based controllers by using unsupervised learning and neuro-fuzzy approach.

Self-organization by fuzzy clustering

Gerardo Beni, Susan Hackwood, Xiaomin Liu

Show abstract

New types of robot systems have recently been suggested based on the idea of distributed, collective intelligence analogous to biological systems. In this paper we investigate the relationship between fuzzy clustering (FC) and problems of self-organization in such systems, referred collectively as distributed robotic systems (DRS). The particular problem of self- organization in DRS prompts a reconsideration of the available FC techniques. Recent advances in FC are reviewed with the intent of adapting them to thy DRS problem. A `minimally biased' clustering algorithm producing a validity ranked hierarchy of partitions is applied to the self-organizing evolution of DRS. Two cases are considered: a bottom up self organization into increasing larger groups and a top down dispersion of a group to optimally cover a sensory field.

Japanese advances in fuzzy systems research

Daniel G. Schwartz

Show abstract

During this past summer (1991), I spent two months on an appointment as visiting researcher at Kansai University, Osaka, Japan, and five weeks at the Laboratory for International Fuzzy Engineering Research (LIFE), in Yokohama. Part of the expenses for the time in Osaka, and all the expenses for the visit at LIFE, were covered by ONR. While there I met with most of the key researchers in both fuzzy systems and case-based reasoning. This involved trips to numerous universities and research laboratories at Matsushita/Panasonic, Omron, and Hitachi Corporations. In addition, I spent three days at the Fuzzy Logic Systems Institute (FLSI), Iizuka, and I attended the annual meeting of the Japan Society for Fuzzy Theory and Research (SOFT-91) in Nagoya. The following report elaborates what I learned as a result of those activities.

Neural network and fuzzy models for real-time control of a CVD epitaxial reactor

Roop L. Mahajan, Xiaohui Wang, H. Xie, et al.

Show abstract

Controlling variability at each of the several processing steps in a wafer fabrication facility is a key concern for a semiconductor manufacturer. All the variables controlling the desired output must be understood and optimized for high yield. In addition, the process controller must be quick and responsive. For typical semiconductor manufacturing processes, which are very complex, designing an effective controller meeting these requirements is a challenging job. Several process control techniques are being pursued. In one of the common approaches, the statistical model based on empirical data and linear models such as auto-regression and moving averages are used. However, these models represent a complex process only in relatively small neighborhoods in the state space. Another approach is to use artificial neural network and fuzzy logic techniques to produce non-linear process models for real time process control. This paper provides a comparison of these different techniques for application to semiconductor manufacturing. The specific process chosen is the CVD-epitaxial deposition of silicon in a horizontal reactor. Analytical model is used to generate data under simulated production conditions. The input parameters are inlet concentration of silane, inlet velocity, susceptor temperature, and the downstream position. The output is the silicon deposition rate. Eighty four data sets are used to train both the neural net and fuzzy logic models. These models are then used to predict the output as a function of input parameters for fourteen additional data sets. A comparison of these predictions with the physical model's computational results and the experimental data shows good agreement. Further work is in progress to fully exploit the potential of physico-neural and physico-fuzzy models for run-to- run real-time process control.

Application of the clinical matrix to the diagnosis of leukemia

Sampath Y. Pakkala, Frank C. Lin

Show abstract

A system for diagnosing leukemia subtypes has been formulated using neural networks. The statistical data of the symptoms collected by hematologists is fed into a single training set using a neural network, where the network is trained by using fast backpropagation algorithm, which when done can help the general practitioners for making diagnoses on the basis of signs and symptoms alone.

Fuzzy Logic and Genetic Algorithms II

Fuzzy and neural systems and vehicle applications '91

Bruno Bosacchi, Ichiro Masaki

Show abstract

We review and discuss the papers presented at the Conference on Fuzzy and Neural Systems, and Vehicle Applications '91 (Intelligent Vehicles '91). This conference, organized by IEEE/IES Intelligent Vehicle Subcommittee, in collaboration with the Japanese Society for Fuzzy Theory and Systems and other related societies, was held in Tokyo, Japan, in November 1991.

Prototype selection rule for neural network training

Lalit Gupta, Jiesheng Wang, Alain Mozart Charles, et al.

Show abstract

Rules to select a set of training prototypes from a collection of training prototypes are developed so that a neural network classifier converges to a solution when pattern classes overlap in feature space. The formulation of the selection rules are based on distortion measure and the network response to the training prototype collection. The rules are also especially useful for selecting training prototypes in order to improve the network robustness and operational flexibility by retraining the network with noisy prototypes. The application and effectiveness of the selection rules are demonstrated on a synthetic pattern classification in Gaussian noise problem and a practical automatic target recognition problem.

Platform for evolving genetic automata for text segmentation

Michael D. Garris

Show abstract

Developers of large-scale document processing and image recognition systems are in need of a dynamically robust character segmentation component. Without this essential module, potential turn-key products will remain in the laboratory indefinitely. An experiment of evolving a biologically based neural image processing system which has the ability to isolate characters within an unstructured text image is presented. In this study, organisms are simulated using a genetic algorithm with the goal of learning the intelligent behavior required for locating and consuming text image characters. Each artificial life-form is defined by a genotype containing a list of interdependent control parameters which contribute to specific functions of the organism. Control functions include vision, consumption, and movement. Using asexual reproduction in conjunction with random mutation, a domain independent solution for text segmentation is sought. For this experiment, an organism's vision system utilizes a rectangular receptor field with signals accumulated using Gabor functions. The optimal subset of Gabor kernel functions for conducting character segmentation are determined through the process of evolution. From the results, two analyses are presented. A study of performance over evolved generations shows that qualifiers for the natural selection of dominant organisms increased 62%. The second analysis visually compares and discusses the variations of dominant genotypes form the first generation to the uniform genotypes resulting from the final generation.

Anomaly detection in data using neural networks with natural selection

Patrick E. Dessert

Show abstract

Frequently, time series data taken off machines contains erroneous data points due to errors in the measurement of the data. One such instance of measuring devices recording anomalies occurs in the `crash testing' of vehicles. In this task, senors are placed on the vehicle and the `crash dummy' and the vehicle is then crashed into a barrier. Force and acceleration data is collected which an engineer inspects for anomalies, correcting those that are found. Artificial neural network (ANN) technology was successfully applied to this problem to eliminate the cost and delay of this manual process. To apply ANN technology in this domain, two technical problems needed to be resolved; the appropriate network architecture and the size of the input set. These two issues are quite common and must be addressed in the development of any neural network application. To resolve both issues, I employed a machine learning algorithm that simulates the Darwinian concept of `survival of the fittest' known as the genetic learning algorithm (GLA). By combining the strength of the GLA and ANNs, a network architecture was created that `optimized' the size, speed, and accuracy of the ANN. This `hybridized' system also used the GLA to determine the `smallest' number of inputs into the ANN that were necessary to detect anomalies in data. This algorithm is known as GENENET, and is described in this paper.

Training neural networks with genetic algorithms for target detection

Alan V. Scherf, Lawrence D. Voelz

Show abstract

Algorithms for training artificial neural networks, such as backpropagation, often employ some form of gradient descent in their search for an optimal weight set. The problem with such algorithms is their tendency to converge to local minima, or not to converge at all. Genetic algorithms simulate evolutionary operators in their search for optimality. The techniques of genetic search are applied to training a neural network for target detection in infrared imagery. The algorithm design, parameters, and experimental results are detailed. Testing verifies that genetic algorithms are a useful and effective approach for neural network training.

Network Analysis

Optimizing feature integration using a machine learning adaptive-synthesis layer

Dennis C. Bielak

Show abstract

The classification process for pattern recognition uses sensors to read measurements from input examples. A feature function next reduces and quantizes these measurements into feature vectors (combinations of feature data). Finally, using the feature vectors, a decision function classifies the current example by comparison to a statistical model of feature vector data. Which particular feature vectors are made available to the decision function has usually been determined during the design phase by the person constructing the system depending on the hardware available for the sensors. With an adaptive synthesis layer, however, a collective learning automata learns which feature vectors are contributing to correct classification and dynamically adjusts the decision function accordingly. A weighted average scheme is used to combine multiple subhypotheses of the example's class (known as rank hypotheses) into a single output hypothesis (known as the super hypothesis). Updating the weights depends on two factors: an evaluation score and a feature vector compensation. The score is a collective measure of the weighted average combination of rank hypotheses. The feature vector compensation is an individual measure of each feature vector's contribution to the overall decision based on a history of detected patterns. This two-layer approach is one of the most efficient methods in multi-objective programming, yet the application of this approach to machine learning as proposed in this dissertation is unique. In particular, a collective learning automation is used to enhance the combination of a number of candidate class subhypotheses into as single, unique classification. This process is refereed to as adaptive synthesis. This approach has been applied to black and white character recognition and grey scale block classification using the Adaptive Learning Image Analysis System (ALIAS) at the George Washington University and the Research Institute for Applied Knowledge Processing (FAW) in Ulm, Germany.

Random structure of error surfaces: toward new stochastic learning methods

Andrew B. Kahng

Show abstract

This paper gives an overview of current work which is directed toward verifying, and exploiting in practice, a recent scaling model for neural network error surfaces. We begin the next section by reviewing a model which describes Boltzmann learning as a stochastic search in the error surface. The discussion also reviews a potentially far-reaching fractal model of neural network error surfaces as instances of a class of high-dimensional fractional Brownian motions (fBm). The main body of the paper then describes a series of experimental results for object classification via noisy sensor data in a mine detection application.

On representations

Bo Xu, Liqing Zheng

Show abstract

First, it was pointed out in this paper that there exists a hierarchical organization of entity to be represented by the neural networks. This hierarchical organization leads to four levels of correspondences between the entity, patterns, elements, items, and the units. Then we classified representations further into more detailed categories from two viewpoints. From the correspondence viewpoint, representations can be classified into five categories: (1) local representation, (2) one-to-one DR, (3) one-to-many DR, (4) many-to-one DR, and (5) many- to-many DR. The second viewpoint of classifying representations is according to the locations where the representations exist. From the location viewpoint, representations fall into two classes: internal representation and external representation. Finally, it was pointed out that the origins of the confusion on representations may come from: (1) non-distinction of the hierarchical organization of entity, (2) non-distinction of the four levels of correspondences between the entity, patterns, elements, items, and the units, (3) use of the too general or too broad term distributed representation, (4) non-distinction of the two levels of graceful degradation, and (5) non-distinction of similarities between different representations.

Study on neural networks in China

Jie-Gu Li

Show abstract

This paper gives a comprehensive survey of the studies made by Chinese scientists on the basic mechanism, applications and hardware implementation of the neural networks (n.-n) as published in Chinese journals in the last three years. For the basic mechanism part, the paper surveys the study on modeling, learning algorithm, network parameters and its synthesis. For the application part, the paper lists the main fields where n-n are applied. For the hardware implementation part, some archi-.. tectural achievements are introduced.

Spatio temporal neurons and local learning rules enabling massively parallel neurocomputers

Arno J. Klaassen

Show abstract

In this paper we present a way to provide neural networks, at the same time, both with a natural notion of time and with locality and modularity in computation. By doing so we ease massively parallel implementations. At the neuron level we introduce pulse code cable neurons, a neuron model with spatio-temporal information processing capabilities and much reduced communication bandwidth; its constituting parts either are branched, one-dimensional electrically equivalent cables of neuronal membrane in which all information processing takes place locally, or 1 bit delayed interconnections that unidirectionally connect one membrane to another. At the network level we argue that the theory of Neuronal Group Selection is an apt candidate for providing modularity by means of its `group-forming' local learning rules. We show that, taking dimensions from biological reality, the overall computation time scales with the spatial and temporal accuracy with which we model a membrane, rather than with the number of neurons or synapses. Routing the interconnections remains a problem, but with current technology real-time simulation of some millions of interconnections seems readily feasible.

Principles of conceptual recognition by neural networks

Sergey K. Aityan

Show abstract

The ability of the neural networks to recognize patterns which have never been presented to the network before is based on comparison of an applied pattern with the patterns stored by training as the references. The presentation of the patterns provides generation of the image features regions in the features space to which the investigated patterns are related. The feature space is associated with the concepts assigned to the pattern features that had been learned by training. Thus the recognition is a classification over the image concepts. The concepts are being assigned to the pattern by stepwise pattern analysis in conceptualizing depth as well as in resolution depth. Every step of analysis provides new conceptual information that corrects the results of the previous steps. The analysis is based on the pattern-pattern hierarchy function, pattern-concept association, and concept-concept associations. The process is controlled by the neural knowledge base that has been learned with the conceptual associations as the rules. Two types of the concepts partial orders are introduced; `concept C₁ is identified AS concept C₂' and `concept C₁ consists OF concept C₂.' The partial orders AS and OF define the hierarchy of concepts in the pattern.

Comparison of conventional and neural network heuristics for job shop scheduling

Vladimir Cherkassky, Deming Norman Zhou

Show abstract

A new neural network for solving job shop scheduling problems is presented. The proposed scaling neural network (SNN) achieves good (linear) scaling properties by employing nonlinear processing in the feedback connections. Extensive comparisons between SNN and conventional heuristics for scheduling are presented. These comparisons indicate that the proposed SNN allows better scheduling solutions than commonly used heuristics, especially for large problems.

How projection-pursuit learning works in high dimensions

Ying Zhao, Christopher G. Atkeson

Show abstract

This paper addresses an important question in machine learning: What kinds of network architectures work better on what kinds of problems? A projection pursuit learning network has a very similar structure to a one hidden layer sigmoidal neural network. A general method based on a continuous version of projection pursuit regression is developed to show that projection pursuit regression works better on angular smooth functions than on Laplacian smooth functions. There exists a ridge function approximation scheme to avoid the curse of dimensionality for approximating some class of underlying functions.

Self-Organization (Poster Session)

Attractor neural networks with global and local dilution of weights

Colin Campbell

Show abstract

We consider the generalization ability of dilute (partially connected) attractor neural networks. The generalization probability is considered for two types of dilution: networks in which the dilution is fixed before learning (local dilution) and networks in which the connectivity is decided during learning so as to optimize the storage capacity (global dilution).

Effects of a dynamic word network on information retrieval

Toshiaki Iwadera, Haruo Kimoto

Show abstract

This paper describes a method of learning a user's field of interest and the effects of applying this method to information retrieval. This method uses a dynamic word network (DWN) within the framework of an associated information retrieval approach. The associated information retrieval approach aims at retrieving easily, and precisely the information that a user needs out of a database. To do this, the information retrieval system must understand what the user intends to retrieve, that is, the user's interest. An associated information retrieval system (AIRS) that incorporates this approach is now being developed. AIRS learns the user's interest from sample documents and represents the user model as a DWN. A DWN consists of nodes and links. Each node corresponds to a term which AIRS can use for retrieval and each link corresponds to the relationship between two terms. Each node also has a node weight. To evaluate DWN performance, we retrieved information using AIRS comparing the output with conventional methods. The results show how the DWN improves the precision of information retrieval.

Neural nets in information retrieval: a case study of the 1987 Pravda

Jan C. Scholtes

Show abstract

This paper presents an implemented neural method for free-text information filtering. A specific interest (or `query') is taught to a Kohonen feature map. By using this network as a neural filter on a dynamic free-text data base, only associated subjects are selected from this data base. The method is compared with some classical statistical information-retrieval algorithms. Various simulations show that the neural net indeed converges toward a proper representation of the query. The algorithm seems well scalable (linear complexity in time and space) resulting in high speeds, little memory needs, and easy maintainability. By combining research results from connectionist natural language processing (NLP) and information retrieval (IR), a better understanding of neural nets in NLP, a clearer view of the relation between neural nets and statistical pattern recognition, and an increased information retrieval quality are obtained.

Topological feature map and automatic feature selection

Ari J. E. Visa

Show abstract

A method for automatic feature selection is described. The method is based on a suitable transform of an image and an estimated histogram of magnitudes of the transformed image. The estimation is done by a self-organizing process. The self-organizing process creates a one- dimensional topological feature map that is used as a feature vector. The method is demonstrated on four textured images.

Novel model of linear associative memory

Ke Chen

Show abstract

A novel model of associative memory with biorthogonal properties, which can be viewed as an improved version of T. Kohonen's linear model of associative memory, is presented in this paper. An iterative algorithm is developed, which makes our model directly employed without any limit condition. Several characteristics of our model which are very similar to biological phenomena are also discussed. It is shown that the optimal value of an associative memory can always be obtained in our model. Some concluding remarks with respect to this model also are given.

Polarization encoding method for performing optical bipolar associative memory

Dazeng Feng, Huanqin Zhao, Shao-Feng Xia, et al.

Show abstract

An architecture for performing outer product Hopfield model associative memory (AM) by using the polarization encoding method has been proposed. It can realize AM with bipolar input vector and bipolar interconnection matrix. In the architecture, the multiplication of bipolar data is realized by rotating the polarization axis of the passed through light. The implementation uses only one optical electrical detector and needn't use any electric differential amplifiers, so it has high potentiality to be realized by all optics. Computer simulation and initial experiment results are given to demonstrate the proposed method.

Pattern classifier: an alternative method of unsupervised learning

Atilla Ekrem Gunhan

Show abstract

In the present work, an alternative multi-layer unsupervised neural network model that may approximate certain neurophysiological features of natural neural systems has been studied. The network is formed by two parts. The first part of the network plays a role as a short term memory that is a temporary storage for each pattern. The task for this part of the network is to preprocess incoming patterns without memorizing, in other words, to reduce the linear dependency among patterns by determining their relevant representations. This preprocessing ability is obtained by a dynamic lateral inhibition mechanism on the hidden layer. These representations are the input patterns for the next part of the network. The second part of the network may be accepted as a long term memory which classifies and memorizes incoming pattern informations that come from a hidden layer. As long as the hidden layer has preprocessed pattern information, the final classification and memorizing process is easy.

Neighborhoods and trajectories in Kohonen maps

Alexander Grunewald

Show abstract

The Kohonen map is a basic paradigm of unsupervised learning. Quite a few descriptions exist of the possibility to expand other paradigms, and to describe their output behavior, for example, the functions that can be learned and the trajectories in the output space. The main parameters of Kohonen maps are the underlying topology and the metric used. In this paper the concepts of nearest neighbor, neighbor, neighborhood and underlying topology are formalized in a set-theoretic manner and thus expanded. Similarly, the concept of metric is enhanced by the introduction of similarity measures. A theorem on continuity of output is proved for such measures.

Architectures

Mathematical model of neural learning

Momaio Xiong, Ping Wang

Show abstract

A generalized unified mathematical model of neural learning is proposed. A learning potential function is defined. A broad class of problems which are related to neural learning are examined. Differential inclusions for finding the minimum of the learning potential functions are derived. The general convergence theorem of optimal solutions are proved and its applications to the supervised learning, unsupervised learning, and Hopfield neural networks are investigated.

Network Analysis

Self-growing neural network architecture using crisp and fuzzy entropy

Krzysztof J. Cios

Show abstract

The paper briefly describes the self-growing neural network algorithm, CID3, which makes decision trees equivalent to hidden layers of a neural network. The algorithm generates a feedforward architecture using crisp and fuzzy entropy measures. The results for a real-life recognition problem of distinguishing defects in a glass ribbon, and for a benchmark problem of telling two spirals apart are shown and discussed.

Self-organizing integrated segmentation and recognition neural network

James D. Keeler, David E. Rumelhart

Show abstract

We present a neural network algorithm that simultaneously performs segmentation and recognition of input patterns that self-organizes to detect input pattern locations and pattern boundaries. We outline the algorithm and demonstrate this neural network architecture and algorithm on character recognition using the NIST database and report results herein. The resulting system simultaneously segments and recognizes touching characters, overlapping characters, broken characters, and noisy images with high accuracy. We also detail some of the characteristics of the algorithm on an artificial database in the appendix.

Fuzzy Logic and Genetic Algorithms I

Incremental supervised learning: localized updates in nonlocalized networks

Wendy Foslien, Tariq Samad

Show abstract

We present a novel yet simple approach to incremental learning in neural networks: the problem of updating a mapping based on limited new data. The approach consists of forming a training set by appending to the new data additional training examples generated by exercising the network. This strategy enables the mapping to be updated in the neighborhood of the new data without causing distortions elsewhere in the input space. The approach can be used with any neural network model; it is particularly useful for the popular multilayer sigmoidal networks in which small parameter changes can have nonlocal consequences. Demonstrations and parametric explorations on a toy problem are described.

Network Analysis

Dilution in small Hopfield neural networks: computer models

Victor M. Castillo, Roger Dodd

Show abstract

The capacity of the Hopfield content-addressable neural network subject to a random dilution is investigated by numerical simulations. The sum-of-outer product learning rule is used to generate the synaptic weight matrix for the storage of M random, binary patterns. Randomly selected synaptic connection are then severed while the memory is probed to determine if the original patterns are still fixed. Other dilution methods are investigated such as one that leaves a Hamiltonian cycle, and one that does not allow isolation of nodes. In general, the critical dilution as a function of the loading ratio, (alpha) equals M/N, takes a sigmoid shape. The critical dilution is also a function of the network size and the sum of the effective Hamming distances between all of the fixed patterns.

Science of Artificial Neural Networks

Volume Details

Table of Contents

Table of Contents