Proceedings Volume 1002

Intelligent Robots and Computer Vision VII

David P. Casasent
cover
Proceedings Volume 1002

Intelligent Robots and Computer Vision VII

David P. Casasent
View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 27 March 1989
Contents: 1 Sessions, 80 Papers, 0 Presentations
Conference: 1988 Cambridge Symposium on Advances in Intelligent Robotics Systems 1988
Volume Number: 1002

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • All Papers
All Papers
icon_mobile_dropdown
Rule-Based For A String Code Pattern Recognition Processor
David Casasent, Sung-IL Chien
A 3-D distortion-invariant multi-class object identification problem is addressed. Our new, fast and robust string-code generation technique (using optical and digital methods) makes the rule-based system quite practical and attractive. Emphasis is given to our rule-based system and to initial data results. Excellent multi-class recognition and reasonable object distortions can be accommodated in this system. We achieved 80-90% correct recognition (PC) for 10 object classes and ±30° 3-D distortions and full 360° in-plane distortions.
Log-Polar Mapping As A Preprocessing Stage For An Image Tracking System
Joseph G. Bailey, Richard A. Messner
Researchers at NASA's Johnson Space Center are developing an image tracking algorithm which employs real-time log-polar mapping as a preprocessing stage. Camera output is processed by a programmable coordinate remapper and then provided as input to a hybrid optical correlation system. The positions of correlation peaks indicate tracking error, and therefore it is important that correlation peaks be sharp to make accurate measurements of position possible. This paper presents current research, based on computer simulations, to determine the relationship between the parameters of the log-polar mapping and the sharpness of the resulting correlation peaks. The goals of this research are to formulate a rule for choosing the optimal mapping parameters and to determine which filter implementations yield the sharpest correlation peaks when tracking errors exist. Prior knowledge of the target image is assumed.
Object Recognition Based On Dempster-Shafer Reasoning
M. H. Hassan
This paper presents algorithms that identify objects in simple and complex images. The scene may contain objects that may touch or overlap giving rise to partial occlusion, objects in noisy environments, and objects from different domain giving potential to a general-purpose multi-domain object recognition. The algorithms are based on the idea of aggregating the total evidence for each object in the image utilizing Dempster-Shafer reasoning. The outline of the proposed approach consists of three processes: prepro-cessing, feature extraction, and understanding. As a step of the understanding process, the evidences related to each object in the image with corresponding models from the image domain are evaluated based on Dempster-Shafer theory of evidence. For this purpose, mass of evidence functions are derived and implemented. The algorithms are either bottom-down or bottom-up control. The later control makes the approach attractive in applications where the time is an important consideration such as robot vision, part inspection, and military target recognition. Experimental results are presented for simple and complex images.
Detecting Man-Made Changes In Imagery
Mark J. Carlotto
A technique for detecting man-made objects using a fractal image modeling approach is described. The technique is based on comparing multiscale signatures computed within sliding windows over coincident regions in "before" and "after" images. The signatures are computed by successive morphological erosions and dilations of the image intensity surface. Similarity measures that are a function of the surface area, fractal dimension, and fractal dimension estimation error over a range of scales are developed for discriminating between natural and man-made changes. The algorithm is applied to digitized aerial photography and SPOT satellite imagery with very encouraging results. Implementation considerations for massively-parallel architectures are discussed.
Efficient Encoding Of Local Shape: Features For 3-D Object Recognition
Paul G. Gottschalk, Trevor Mudge
To date, algorithms designed to recognize 3-d objects from intensity images have either employed global features or local features based on straight line segments. Global features limit recognition to objects that are completely visible and segmented from the background; these are often unrealistic conditions in machine vision domains. Local features based on line segments are somewhat more flexible since partially visible objects may be recognized, however, the requirement that objects will have sufficient straight lines to ensure robust recognition is also limiting. In this paper, we discuss a set of features, called scale-invariant critical point neighborhoods, or SICPN's, which are more generally applicable to 3-d object recognition. SICPN's efficiently encode the local shape of edge segments near points of high curvature. To within variations caused by image noise and discretization, SICPN's are invariant to image plane translations, rotations, and scaling. These invariant properties enhance the utility of these features in 3-d recognition algorithms. Furthermore, we show that SICPN's have many other desirable characteristics including informativeness, ease of detection, and compact representation. In addition, we empirically demonstrate that SICPN's are insensitive to noise. The above characteristics are essential if a feature is to be useful for 3-d object recognition.
Effects Of Lateral Subtractive Inhibition Within The Context Of A Polar-Log Spatial Coordinate Mapping
Matthew G. Luniewicz, Richard A. Messner
A simulation implementing Lateral Subtractive Inhibition (LSI) in a 2D coordinate space was developed providing the means to determine the effects of varying fundamental parameters of the operation. The strength of inhibition between receptors (sensor elements) as a function of distance is variable (the weighting function), as is the absolute size of the neighborhood of receptor interaction. Several different methods of performing the LSI operation were looked at before finding a method that was realizable in a computer simulation. The problems of implementing LSI in software are described in some detail. Application of LSI to a polar-log mapped coordinate system is considered in more than one aspect. The operation can be applied before or after the mapping, with weights dependent on either linear or exponential distances between receptors. The differences between these methods are examined. A set of experiments was devised in order to determine the usefulness of LSI to extract edge and curvature information.
Automated Generation Of Concatenated Arcs For Curve Representation
Sing T. Bow, Tsung-sheng Chen, S. Honnenahalli
Effective representation of curves is an important aspect of pattern recognition. Extensive research has been done along this line by representing the curve with piecewise polynomial function of degree greater than one. Spline is a good example and is effective. Nevertheless, its mathematical representation still looks cumbersome. In this paper, an algorithm is designed to automatically generate a concise and rather accurate representation for curves in terms of concatenated arcs. Major idea is to efficiently and effectively detect the appropriate break points on the curves for the concatenated sections. Effectiveness of the algorithm has been evaluated through experiments on large number of various shapes. Results obtained are very satisfactory. For their descriptions, only one order of magnitude fewer segments are needed than those needed by linear approximation. Curves reconstructed from the descriptions match closely with the original ones, even for the very complex curves. Experiments were conducted on VAX 11/785 and also on our new PC-based image processing system. Results show that the algorithm is computationally very efficient. This system is useful in the archival and retrieval of graphic information, especially for the automatic handling of large amount of documents including text and graphics.
Piece-Wise Linear Approximation Of An Object Boundary From Freeman Chain Code
Soumitra Sengupta, Paul M. Lynch
An algorithm is described to generate a polygonal approximation to an object boundary from the Freeman chain code using digital filters. In computer vision and pattern recognition, concatenation of digital straight lines is a simple and compact representation of the boundary of well behaved planar regions. Digital straight lines have often been used as features to track and recognize shapes and objects. To abstract digital straight lines, the chain code sequence is used as the input to two digital filters with different response characteristics. The difference between the outputs of the filters is used to detect corners; noisy line segments result in reduced corner detection accuracy. The algorithm also provides an estimate of the angle between neighboring lines and a basis for estimating the length of the line segments. Moreover, an extension of this algorithm may be used to detect quadratic segments in an object boundary.
Detection Of Oriented Line Segments Using Discrete Cosine Transform
H. S. Hou, M. J. Vogel
The widespread use and increasing sophistication of computer vision systems has generated a great deal of interest in image processing techniques capable of tracking moving targets at high speed. The Hough transform has been cohnonly used as a means of detecting target trajectories. Nevertheless, the computational load for performing a Hough transform is overwhelming. In addition, the re$ult is usually noisy. In a recent study, we discovered that the discrete cosine transform (DCT) behaves like a directional filter and, as such, can be used to detect and locate oriented line segments. This paper describes the directional filter property of the DCT and the fast computing algorithm for detecting oriented lines. The DCT has been used in image data compression for many years. Many fast DCT algorithms exist, and VLSI chips for implementing such algorithms have been designed. In this paper the directional filter property of DCT is first presented. The one-dimensional. DCT, being viewed as a filter, can split the filtered output into two halves, one corresponding to the low-frequency band of the input and the other to the high-frequency band of the input. In the two-dimensional case, the DCT behaves like a quadrature filter without aliasing. Based on the quadrature filter property of DCT, a new method for detecting oriented line segments has emerged. We next present the fast processing algorithm for digital implementation and conclude with some simulation results.
Classification Of Partial Shapes Using String-To-String Matching
Hong-Chih Liu, M. D. Srinath
In this paper, we present an algorithm that enables us to recognize a partial shape without regard to its size, rotation or location. The algorithm uses the curvature function obtained from the digital representation of the shape. The curvature function is next represented by a string, by slicing it with horizontal lines. By using as primitives the sign of the curvature function slope within a pair of such lines, a symbol string is obtained which describes the relative amplitude of the peaks and valleys on the waveform and is invariant to size or location of the object within the scene. Since the curvature function is periodic, it can be made rotation invariant by suitably choosing the start point of the string. The resulting strings are matched using a standard measure of dissimilarity such as the number of operations such as substitution, deletion and insertion needed to transform one string to another. To obtain rotation invariance, we determine the dissimilarity measure by trying all the characters of one string (the test string) as start points. The algorithm has been successfully tested on several partial shapes, using two sets of data. The first set consisted of 4 classes of various types of aircraft, while the second set consisted of shapes of different lakes. The algorithm works reasonably well even in the presence of a moderate amount of noise.
Circle Parameter Estimation From Partial Arcs
Arun Prakash
In machine vision or image processing applications, a common problem is to recognize objects that are only partially visible. A more tractable subset of this problem occurs when an object with known shape is only partially visible and we need to accurately compute its position and orientation. For example, a robot (equipped with vision capabilities) may be required to pick up objects that are randomly placed in its field of view. Knowing the position and orientation of such an object when it is fully or partially in the field of view may be important for it to perform this task.
An Efficient Grid-Based Representation Of Arbitrary Object Boundaries
Chang Y. Choo
We present an efficient scheme for representing irregular object boundaries which belongs to the grid-based chain coding family. The scheme which is called polycurve codes extends the chain coding family, e.g., chain codes and generalized chain codes, by employing predefined circular-arc segments as boundary approximators in addition to straight-line segments. Each circular-arc segment in polycurve codes is predefined around the associated line segment and labeled as an integer. Polycurve codes enables direct extraction and labeling of high-level line and arc segments from arbitrary boundaries. Once the object boundaries are encoded by poly-curve codes, feature calculation and shape analysis may be done solely based on the look-up table indexed by the integer labels. Experimental results show that polycurve codes improve performance, such as compactness and encoding time, over the existing chain coding family.
The Engineering Decisionmaking Process
George A. Hazelrigg
We think of modern civilization as a technological society, fed by scientific advance and engineering implementation resulting in new products and services. The role played by engineers in this process is every bit as important as that played by scientists, yet the public associates scientists with technology, not engineers. And when engineering advances are made, such as the recent fabrication of an 80 micron electrostatic motor, the news media announced the event by saying, "Scientists at the University of California at Berkeley have... " This despite the fact that they were engineers. Even engineers seem to like being referred to as scientists. The news media frequently refer to "scientific achievements" and "engineering failures." I think that engineers have lost, or perhaps never found, their identity. Finding this identity is a crucial step in the evolution of the engineering profession, and, I believe, it could lead to advances in the art itself.
Comparison Of Object-Oriented And Structured Programming For Image Processing Applications
Lewis J. Pinson
The object-oriented paradigm offers a new approach to the development of software for image processing applications. The fundamental concept of the object-oriented paradigm is that problem solutions are implemented by sending messages to objects. This requires that the objects in a solution be defined along with messages to which each object will respond. This is in contrast to structured programming wherein one defines data structures and sends those data structures to a procedure as parameters.
Can Scale Space Filtering Enhance Fractal Analysis?
Manfred Rueff
Fractal dimensions are quantities which have been shown to be useful in the classification and segmentation of textures with scaling behaviour. Problems arising in the numerical determination of fractal dimensions are briefly mentioned. Scale space filtering techniques are suggested to overcome some of these problems which in particualr are given with the detection of the limited scaling regions of natural textures.
Image Compression In Orthogonal Spline Space
H. S. Hou
This paper presents a theoretical analysis and computing technique for performing image data compression in orthogonal spline space. First, we define the orthogonal spline basis functions which are derived from the subdivision of the discrete cosine transform. The number of subdivided blocks is the integer power of 2. In this paper, we present the detailed analysis of the bi-orthogonal and the quad-orthogonal spline spaces. Fast image compression and decompression algorithms in these spaces are given and simulated with examples. Based on the simulation results, one can conclude that this new image compression method is superior to the ordinary discrete cosine transform; i.e., it can retain more high-frequency details with less artifact and noise. Compared to the subband coding technique, this new method is still favorable both in terms of image detail and buffer size. In short, this new compression method reproduces better image quality for a prescribed data compression ratio in the reconstructed image.
A Symmetry-Insensitive Edge Enhancement Filter Incorporating Local Structure
Peter H. Gregson, Sherwin T. Nugent, Max S. Cynader
A new non-linear edge enhancement filter is proposed which is insensitive to edge profile symmetry, but is sensitive to the local spatial coherence of the image. Because the filter is insensitive to profile symmetry, the response combination problem is avoided. The filter is suitable for enhancing edges in natural scenes, in which variations in illumination, surface reflectance and surface orientation result in a wide variety of edge profile symmetries. In most applications, filter response is then compared to a threshold to detect edges. The magnitude and orientation of the extremal directional second derivative of intensity at each pixel are first determined with the aid of the Krumbein transform. A histogram is then formed of the cumulative response over a small neighbourhood about each pixel as a function of orientation. One or more peaks in the histogram are found, and a representative magnitude and orientation computed. If the width of the histogram peak is sufficiently narrow and its magnitude is large enough, the pixel is assumed to lie on or near an edge. It is given a magnitude and orientation representative of the histogram peak. Peak width and magnitude are shown to be dependent on the presence of sufficient local evidence for an edge, thereby incorporating local structure. By accepting multiple histogram peaks, the algorithm is made to perform well near edge corners and intersections.
Image Segmentation Using Background Estimation
Arturo A. Rodriguez, O. Robert Mitchell
Segmentation algorithms that do not require preselected thresholds and are rapid and automatic for various applications are introduced. The approach is to track how the background graytone distribution varies throughout the image without a priori knowledge. Rectangular image regions are sampled to track background variations. Criteria based on statistical theory are used to determine the homogeneity of regions and to distinguish between background-homogeneous and object-homogeneous regions. The criteria include upper and lower bounds to account for practical situations which arise when the underlying assumptions become invalid. Segmentation is focused on non-homogeneous regions. The background graytone distribution throughout the image is estimated from regions where it is measurable. Knowledge of the local background distribution throughout the entire image is then used to preserve the local brightness relationship of object pixels to the background. Rather than simply mapping the graytone image into an object-background binary image, more information is retained by determining additional thresholds and mapping pixels into object brightness relative to background and into uncertainty. Image regions made up of uncertainty labelled pixels assist in identifying image regions that require further processing.
A Class Of Iterative Thresholding Algorithms For Real-Time Image Segmentation
M. H. Hassan
Thresholding algorithms are developed for segmenting gray-level images under nonuniform illumination. The algorithms are based on learning models generated from recursive digital filters which yield to continuously varying threshold tracking functions. A real-time region growing algorithm, which locates the objects in the image while thresholding, is developed and implemented. The algorithms work in a raster-scan format, thus making them attractive for real-time image segmentation in situations requiring fast data throughput such as robot vision and character recognition.
Geometry Guided Segmentation
Stanley M. Dunn, Tajen Liang
Our overall goal is to develop an image understanding system for automatically interpreting dental radiographs. This paper describes the module that integrates the intrinsic image data to form the region adjacency graph that represents the image. The specific problem is to develop a robust method for segmenting the image into small regions that do not overlap anatomical boundaries. Classical algorithms for finding homogeneous regions (i.e., 2 class segmentation or connected components) will not always yield correct results since blurred edges can cause adjacent anatomical regions to be labeled as one region. This defect is a problem in this and other applications where an object count is necessary. Our solution to the problem is to guide the segmentation by intrinsic properties of the constituent objects. The module takes a set of intrinsic images as arguments. A connected components-like algorithm is performed, but the connectivity relation is not 4- or 8-neighbor connectivity in binary images; the connectivity is defined in terms of the intrinsic image data. We shall describe both the classical method and the modified segmentation procedures, and present experiments using both algorithms. Our experiments show that for the dental radiographs a segmentation using gray level data in conjunction with edges of the surfaces of teeth give a robust and reliable segmentation.
Moment Invariants Of Perspective Transformation
Firooz A. Sadjadi
One of the fundamental problems in pattern recognition, computer vision, and scene analysis is the recognition of objects independent of size, translation, rotation, and perspective transformation. Perspective transformations are important because any lens system induces such a transformation. Extraction of attributes that are invariant under such transformations is important, because such attributes exploit the essential, nonchanging, and discriminatory natures of the physical objects Moreover, using such invariant attributes significantly reduces the complexities of the computation and the storage requirements of recognition systems. In this paper we derive a unique set of moment invariants of perspective transformation. We also provide experimental results for a set of geometrical 3D objects verifying the invariancies and uniqueness of the derived moments for several different & arbitrary perspective transformations.
Solving For The General Linear N-Ansformation Relating 3-D Objects From The Minimum Moments
D. Cyganski, S. J. Kreda, J. A. Orr
This paper describes an extension of previous moment based techniques for the solution of the general linear transformation that describes the relationship between two 3D objects given either a 3D density map or a sufficient number of projections of the object. It is shown that values of the second and third order 3D moments are sufficient to solve this problem. The solution technique exploits the tensor nature of the moment set to enable solution of the object transformation matrix by direct (non-iterative) means. An implementation is also described that uses a fast moment method to generate the required data from object descriptions. Experimental results are shown.
Zernike Moment Based Rotation Invariant Features For Patter Recognition
Alireza Khotanzad, Yaw Hua Hong
This paper addresses the problem of rotation invariant recognition of images. A new set of rotation invariant features are introduced. They are the magnitudes of a set of orthogonal complex moments of the image known as Zernike moments. A systematic reconstruction-based method for deciding the highest order of Zernike moments required in a classification problem is developed. The "quality" of the reconstructed image is examined through its comparison with the original one. More moments are included until the reconstructed image from them is close enough to the original picture. The orthogonality property of the Zernike moments which simplifies the process of image reconstruction makes the suggested feature selection approach practical. Features of each order are also weighted according to their contribution (their image representation ability) to the reconstruction process. This contribution is measured by comparing the difference between the original and the reconstructed image using these features with that obtained by using features of one less order. The application to a 26-class character data set yields 97% classification accuracy.
Coincidence Recognition Of Moving Objects In Arbitrary Orientation
Hsiao T. Chang, Xu F. Wang, Shen Lian, et al.
This paper explores the coincidence assembly recognition of two objects of any shape in any designated position within a three dimensional space. A mathematical model based on algebraic topology theory is presented. The objects are regarded as envoloped with 2- simplexes which are convex in shape. Then the coincidence recognition for the two objects can be reduced to an assembly of a mutually correspondent 2-simplex on the two objects. An algorithm is proposed for coincidence assembly with a computer zztomatic coorperative control, which eliminats the need for vision aid. This algorithm is straightforward and involves the least amount of computing ever proposed for such an operation. This algorithm has been simulated using Pascal language on an IBM-PC microcomputer.
Noise Reduction And Defect Segmentation Of The Surface Picture Of Bearing Rollers In The Presence Of Oil Pollution
Y. Zhang, W. Xian, Z. Tu, et al.
In this paper, we deal with the problem of defecting and segmenting the surface appearance defects of the bearing rollers with our specially designed inspection system. We propose two different methods to process the pictures obtained from different precision of rollers. Our algorithms are dimonstrated feasible to be implemented on general purpose hardware image processing architecture and thus achieve high speed and show the promise to be practical in the future.
A Review Of Color Vision And Imaging
Matthew Turk
As an introduction to the session on color vision and multisensor processing, we review the physics of color imaging as well as fundamentals of human color perception. Alternative color representations motivated by both pattern classification and visual perception are discussed, as are classes of color algorithms used in scene segmentation.
Learning Lightness Algorithms
Anya C. Hurlbert, Tomaso A. Poggio
Lightness algorithms, which recover surface reflectance from the image irradiance signal in individual color channels, provide one solution to the computational problem of color constancy. We compare three methods for constructing (or "learning") lightness algorithms from examples in a Mondrian world: optimal linear estimation, backpropagation (BP) on a two-layer network, and optimal polynomial estimation. In each example, the input data (image irradiance) is paired with the desired output (surface reflectance). Optimal linear estimation produces a lightness operator that is approximately equivalent to a center-surround, or bandpass, filter and which resembles a new lightness algorithm recently proposed by Land. This technique is based on the assumption that the operator that transforms input into output is linear, which is true for a certain class of early vision algorithms that may therefore be synthesized in a similar way from examples. Although the backpropagation net performs slightly better on new input data than the estimated linear operator, the optimal polynomial operator of order two performs marginally better than both.
Discounting Illuminants Beyond The Sensor Level
Ron Gershon, Allan D. Jepson
We propose a method, based on finite-dimensional linear models of reflectance and illumination, which allows the transformation of chromatic images into color constant images. This proves to be useful in applications where either there is no information on the illuminant present in the scene, or when such information is confounded by the existence of inter-reflections between objects. This method is aimed at computations taking place beyond the sensory level of vision systems, and may use inputs corrected by sensors. In contrast to previous work, we show that good results can be obtained using a 3-receptor system and some knowledge about the spectral properties of natural materials and illuminants. In the method developed, an estimate of illuminant in the scene is computed, which allows the computation of color constant descriptors of the pixel values in the image. In addition, we show a method of computing the actual reflect ances of the materials in the scene out of the computed color descriptors.
Two Approaches To Colour Recognition In Robotics Based On A Reflectance Spectrum Analysis Method
Elzbieta Marszalec
In the recognition of colours in robotics two types of tasks can be distinguished: the task of recognition of coloured objects from an a priori known group of objects and the more universal task in which any coloured object can be recognized without any a priori information. At the beginning of the paper a review of different methods of colour recognition applied in robotics is given and all methods are classified according to one of the two above-mentioned tasks. Then the method of recognition of colours based on a reflectance spectrum analysis method is described. Using this method two approaches to colour recognition for the realization of both tasks is explained. For each task a specially designed sensor system used for the practical realization of recognition is described.
A Multi-Sensor Robotics System For Object Recognition
Khosrow M. Hassibi, Kenneth A. Loparo, Francis L. Merat
An algorithm for object recognition based on the data from various contact and/or non-contact sensors is described. The sensors are primarily used for resolving the ambiguities which may be encountered in recognizing the object identity and its pose. The feature vector representing an object-state is partitioned into a finite number of feature subvectors each containing the features extracted from a different sensory source. To perform the recognition task through integration of data from different sensors, an a priori cost value is assigned to each feature. These feature costs and the object models are used for deriving a decision tree based on a sequential pattern, recognition approach. The decision tree guides the system during the recognition phase. General strategies for feature extraction from various sources are implemented as a finite state machine. A coordinator module supervises the coordination of manipulation and recognition processes and the execution of system state changes which are required to successfully implement the recognition algorithm.
Temporal Pattern Recognition: A Network Architecture For Multi-Sensor Fusion
C. E. Priebe, D. J. Marchette
A self-organizing network architecture for the learning and recognition of temporal patterns is proposed. This multi-layered architecture has as its focal point a layer of multi-dimensional Gaussian classification nodes, and the learning scheme employed is based on standard statistical moving mean and moving covariance calculations. The nodes are implemented in the network architecture by using a Gaussian, rather than sigmoidal, transfer function acting on the input from numerous connections. Each connection is analogous to a separate dimension for the Gaussian function. The learning scheme is a one-pass method, eliminating the need for repetitive presentation of the teaching stimuli. The Gaussian classes developed are representative of the statistics of the teaching data and act as templates in classifying novel inputs. The input layer employs a time-based decay to develop a time-ordered representation of the input stimuli. This temporal pattern recognition architecture is used to perform multi-sensor fusion and scene analysis for ROBART II, an autonomous sentry robot employing heterogeneous and homogeneous binary (on / off) sensors. The system receives sensor packets from ROBART indicating which sensors are active. The packets from various sensors are integrated in the input layer. As time progresses these sensor outputs become ordered, allowing the system to recognize activities which are dependent, not only on the individual events which make up the activity, but also on the order in which these events occur and their relative spacing throughout time. Each Gaussian classification node, representing a learned activity as an ordered sequence of sensor outputs, calculates its activation value independently, based on the activity in the input layer. These Gaussian activation values are then used to determine which, if any, of the learned sequences are present and with what confidence. The classification system is capable of recognizing activities despite missing, extraneous or slightly out-of-order inputs. An important predictive quality is also present. This system can predict that an activity may be about to occur prior to receiving confirmation that all component events have occurred. Overall, the temporal pattern recognition system allows the robot to go beyond the alert / no alert stage based on a simple weighted count of the sensors firing. ROBART is now able to determine which activities are occurring, enabling it to intelligently act on this information.
Optical Neural Net And Symbolic Substitution Production System
Elizabeth Botha, David Casasent
New optical neural net and symbolic substitution production system architectures are described. Several new learning aspects of these architectures are noted and initial results obtained with a multiple object/part identification problem are noted as demonstrations of optical symbolic filters and production systems.
Neural Network Applications In Hand-Written Symbol Understanding
Kangsuk Lee
Several issues in neural networks are investigated in the context of hand-written symbol recognition and a new hierarchical network structure is proposed. There have been many successful demonstrations of hand-printed character or cursive word recognition using neural network algorithms with a limited data set. However, the required network size increases as the variations allowed in the training and test patterns increase. Two highly stylized data sets (American Bankers Association's E-13B font and ANSI standard OCR-A font) are compared with hand-printed block characters in terms of required network size, mean squared error per pattern at fixed training sweeps, and number of sweeps required to reach a certain confidence level of recognition. At each training stage, the connection strengths are stored and these stored connection strengths are loaded into the net later to test generalization against similar test patterns. When the test data is presented to the net, the performance is not very good, but the network adapts to the new data quite quickly when the training and test data are very regular and/or the number of different classes for which the network was trained is small. To enhance the generalization and adaptation performance, a new hierarchical structure is proposed where the training patterns are grouped into cascade of three class problems. This structure is verified by experiments.
Supersymmetry For Cognitive Science
Brian J. Flanagan
Machine vision may be understood as an attempt to replicate natural vision. The latter process is associated with neural networks. Light enters the eye and sets in motion processes which culminate in observed patterns of color. Light is, of course, an electromagnetic phenomenon. Our nerve cells communicate with each other via electrochemical means. To say that a process is electrochemical is to say that it is electromagnetic, involving the exchange of photons among electrons. It seems, therefore, that we ought to be able to understand vision in terms of the physical theory of electromagnetism. Historically, however, it has been held that such properties as color do not belong to the physical world. Color has long been considered to be a mental effect of physical stimuli. Nevertheless, it is generally understood that color is related to the energy, wavelength, and frequency of the photons which give rise to the "mental" impression of hue and intensity and so forth. Similar arguments and propositions can be made for all of the sensory modalities, but we will restrict our attention to vision for the time being. If, with Mach, we accept that colors are physical objects, we are obliged to seek a suitable place for them within the body of physical theory. Where should we locate them? Colors are given to us as simple entities, having no parts: We can point to an object that is blue, but we cannot say what blue is. Color is given to us as elemental. In a formal theory, we have a number of elements, rules for joining them, well-formed formulae, and methods of proof. It seems to make good sense to place color among the elements of a formal theory (T). If our mind/brains can be modelled by a formal theory, it follows logically that we should not be able to define our elements - i.e., if we could define our elements, they would not be elements.
Exploration And Search With Intentional Visual Actions
Deborah Strahman
Active vision is a term used to denote the control of the camera's geometric parameters to aid in the satisfaction of perceptual goals. Here it is combined with intentional visual actions generated by task oriented decisions. The focus of this paper is the use of intentional active vision to perform exploration and spatial search. These are common components of many perceptual tasks in which the viewer is dealing with a space and objects within that space that are at most partially known or modeled. Examples of such tasks are determining the location of the product code of unknown part on an assembly line, finding an exit sign in an known building, or a lost tool in a cluttered workspace. As proposed here, exploration is a reactive process which has strategies, composed of visual actions and predictions, for dealing with the various spatial situations encountered. To recognize the current situation temporary models of the locally visible surfaces are built. These enable the spatial reasoning for construction of the strategies. Though these surfaces are disposed of when no longer local, a coarse spatial memory avoids duplication of effort and aids in deciding when the task is complete. The visual actions are intended motions for which viewing direction and possibly other camera parameters are controlled.
Adaptive Moving Object Tracking Integrating Neural Networks And Intelligent Processing
James S. J. Lee, Dziem D. Nguyen, C. Lin
A real-time adaptive scheme is introduced to detect and track moving objects under noisy, dynamic conditions including moving sensors. This approach integrates the adaptiveness and incremental learning characteristics of neural networks with intelligent reasoning and process control. Spatiotemporal filtering is used to detect and analyze motion, exploiting the speed and accuracy of multiresolution processing. A neural network algorithm constitutes the basic computational structure for classification. A recognition and learning controller guides the on-line training of the network, and invokes pattern recognition to determine processing parameters dynamically and to verify detection results. A tracking controller acts as the central control unit, so that tracking goals direct the over-all system. Performance is benchmarked against the Widrow-Hoff algorithm, for target detection scenarios presented in diverse FLIR image sequences. Efficient algorithm design ensures that this recognition and control scheme, implemented in software and commercially available image processing hardware, meets the real-time requirements of tracking applications.
A Massively Parallel Approach To Object Recognition
Carl R. Feynman, Harry L. Voorhees, Lewis W. Tucker
This paper describes a model-based vision system for recognizing three-dimensional objects in two-dimensional images of natural scenes. The system is implemented entirely in parallel on a CM-2 Connection Machine System. It is novel in that it does not use a three-dimensional representation of the object. Instead, the representation of the object is constructed from a number of images of the object taken from known positions. By interpolating the appearance of the same feature in different views, it is possible to estimate the appearance of the feature from any camera position. By applying this interpolation process to many features, it is possible to reconstruct the appearance of the object from any angle and position. Features observed in the scene to be recognized can then be used to generate hypotheses as to the pose of the object. These hypotheses then compete to explain the observed features of the scene. A novel representation of features is used, which permits features of many types - lines, corners, curvature extrema and inflection points - to be treated identically by the correspondence, interpolation, and hypothesis generation processes.
Data Fusion And Image Segmentation Using Hierarchical Simulated Annealing On The Connection Machine(Tm)
B. P. Kjell, P. Y. Wang
We present a family of algorithms for low level data fusion and image segmentation using hierarchical layers of processor grids. Previously, image pyramids and simulated annealing have separately been used for these problems. We investigate combining these methods into pyramidal annealing algorithms.
Laser Radar Array Used For Improving Image Analysis Algorithms
Mark F. Cullen, Peter J. de Groot, Gregg M. Gallatin
In this paper we report on a system that incorporates laser ranging data with reflectance data for improved performance in an object recognition task. A new laser diode-based radar array developed at Perkin-Elmer provides simultaneous coherent ranging and velocimetry data for multiple points in an image scene. The device consists of simple optics and non-mechanical components and is compact and simple to use. While the hardware expansion to higher density laser diode arrays continues, we have designed parallel signal processing algorithms to process the raw data for image analysis applications. We discuss how this information is used to improve the performance of a model-based object recognition system. The results of this research have immediate application to robotics vision and space docking systems, and longer term application in terrain mapping and image scene analysis as the signal performance is improved.
Integrating Control Structures Into The Generalized Hough Transform
Joel A. Rosiene, Ian R. Greenshields
The paper discusses control strategies which might be used in an integrated fashion with the Generalized Hough Transform.
CC&D Analyst Associate Using The Connection Machine
R. Michael Hord
Camouflage, Cover and Deception (CC&D) techniques by military units include the use of trees for concealment, nets and tarpaulins over emplacements, background matching paints, decoys and a variety of other hiding methods. Image analysts seek to exploit reconnais-sance pictures for opposition deployment and order of battle information. To facilitate rapid image exploitation in the presence of CC&D activity, a phase 1 softcopy image analyst workstation was developed on the Connection Machine (CM), a massively parallel supercomputer. This workstation consists of a Symbolics 3675 host for the CM, a high resolution color image display unit, a map projection unit, an ancillary computer running an expert system and a print station that issues a formatted exploitation report. The Symbolics/CM/display system performs image manipulation under operator command. The expert system, which in phase 2 will be integrated onto the Symbolics, functions as an analyst associate to improve the productivity of analysts with low to moderate skill levels under time pressure. The expert system addresses tactical situations using a specified ER (exploitation request), doctrine, terrain, weather and collateral reports to advise the operator regarding the most effective image manipulation algorithms to apply for enhancing the digital imagery.
Perceptual Grouping Of Curved Lines
John Dolan, Richard Weiss
Image curves often correspond to the bounding contours of objects as they appear in the image. As such, they provide important structural information which may be exploited in matching and recognition tasks. However, these curves often do not appear as coherent events in the image; they must, therefore, be (re)constructed prior to their effective use by higher-level processes. A system (currently being built) to accomplish such reconstruction of image curves is described herein. It exploits principles of perceptual organization such as proximity and good continuation to identify co-curving or curvilinear structure. Components of each such structure are replaced by a single curve, thus making their coherence explicit. The system is iterative, operating over a range of perceptual scales--fine to coarse--and yielding a hierarchy of alternative descriptions. Results are presented for the first iteration, showing the performance of the system at the finest perceptual scale and indicating the reasonableness of the paradigm for subsequent iterations.
Fuzzy Logic, Neural Networks And Computer Vision
Madan M. Gupta
The emulation of human-like vision on a computer is often the desired goal of robot vision and medical image processing. Human vision possesses some important attributes such as "perception" and "cognition". It is imperative that some aspects of these attributes are captured when emulating the human visual system. The processes of perception, mentation, and cognition imply that objects and images are not crisply perceived and, therefore, the more common forms of logic such as binary cannot be used. The recently developed calculus of fuzzy logic along with neuron-like computational units appear to be very powerful tools for the emulation of human-like vision fields on a computer. In this paper, we describe the connection between fuzzy logic and neural networks for the area of computer vision.
Fuzzy Segmentation Of Natural Scenes Using Fractal Geometry
James M. Keller, Thomas Downey
Segmentation of an image into meaningful regions is a crucial component in intelligent scene understanding. In images of natural scenes there is a high degree of variability and uncertainty in the features which represent the regions and objects. In previous papers, new features, based on fractal geometry, were introduced to describe natural textured regions. In this paper, those fractal features are utilized as descriptors in segmentation algorithms which produce fuzzy partitions of the image plane. In particular, segmentation schemes based on the fuzzy K-nearest-neighbors and split-and-merge are implemented to segment digital images.
Propagation Of Uncertainty Using Neural Networks
Raghu J. Krishnapuram, Joonwhoan Lee
Uncertainty management (belief maintenance) is an important part of most knowledge-based computer vision systems. In this paper, we examine a methodology to aggregate and propagate uncertainties in a neural-network-like structure. Each node in the network represents a hypothesis. The inputs are the uncertainties associated with the knowledge sources that support the hypothesis, and the output is the aggregated uncertainty. The activation function is selected based on concepts from fuzzy set theory. The parameters of the activation function are chosen depending on the type of aggregation required. In addition to the traditional union and intersection aggregation connectives, we propose the use of a generalized mean connective to increase flexibility. Some attractive properties of the connectives are discussed and a training procedure for such networks is also proposed.
Utilizing Context In Computer Vision By Confidence Modification
Advait Mogre, Robert McLaren, James Keller
The analysis of images derived from a "real" scene often involves applying a rule-based structure leading to object detection, recognition and object relationships. A rule consists of a set of conditions whose satisfaction triggers the rule. These conditions are based on domain knowledge of image characteristics, inferred knowledge, and scene context. Firing a rule leads to conclusions with assigned confidences. Frequently, additional scene context knowledge can have a significant effect on the final conclusion but is excluded because its form varies from scene to scene, and its unavailability in the main rule base would prevent rule application. This paper utilizes context by altering confidences associated with conclusions of specific rules. Uncertainty computations for conclusions of rules to which the contextual information applies would be effected. For each rule, Rj, of the main rule structure, a context factor Cj with values over [-1,1], is defined. If Cj = 0, no relevant context information is present; if C≥0, there is a degree of support; if C≤j0, a conflict with the conclusion is implied. If P(THEN,j) represents the certainty in the conclusion of rule Rj, then modify this as P'(THEN,j)'[P(THEN,j), Cj], where f' is selected to satisfy conditions imposed on the confidence and context information.
Perceptual Models For Computer Vision
Panos A. Ligomenides
Human perception of resemblance in spatio-temporal patterns y(x; X) is modeled interactively by procedural elastic templates, called "formal description schema - fdsk" models. The identification, representation and quantification of uncertainty-ambiguity in human holistic perception are problems to be resolved by the man-machine interactive modeling of the human faculties of discernment of generic percepts and perceptual organizations, and of assessment of degrees of conformity to these characteristic elastic constraints of a reference k-norm. Underconstrained and often indeterminate visual sensory patterns are, in turn, recognized in real-time by the procedural fdsk-models, which, progressively and accumulatively, assess conformity of sensory patterns to reference k-norms. In this paper we discuss the nature of the formal tolerance-models of conformity, we review our work on the interactive fdsk-modeling of human perception of resemblance for 1D patterns y(x; X), and we examine fuzzy-theoretic aspects of the probabilistic, possibilistic or belief measures of uncertainty in assessing and in predicting conformity to the elastic constraints of a k-norm, in the presence of incomplete or erroneous sensory data.
New Directions In Multi-Sensor Fusion
Ronald R. Yager
We discuss the Dempster-Shafer theory of evidence. We suggest that alternative aggregation procedures to Dempster's rule may be appropriate in certain environments. We provide a arithmetic type of aggregation. We also look at the question of making decisions in the face of knowledge in the form of Dempster-Shafer belief structures.
Cognitive Vision Fields And Percepts For Image Processing
Madan M. Gupta, George K. Knopf
In this paper, we introduce some new notions of perception and cognitive vision fields for the emulation of human vision. The attributes of an image (edges, colors, etc.) are assumed to have graded membership values distributed over the interval [0,1]. These graded attributes are responsible for the formation of cognitive fields. The information contained in these cognitive vision fields are extracted using numerous densely packed units called percepts. The "percept" is a neuron-like computational unit that responds to a group of neighborhood pixels rather than individual image pixels. Examples of cognitive vision fields for two gray-level images are presented.
A Fuzzy Approach To The Interpretation Of Robot Assembly Forces
Deepak Sood, Michael C. Repko, Michael C. Moed, et al.
Uncertainties encountered in part positioning and variations in part dimensions have motivated the use of sensor feedback control in robot assembly. This paper presents a fuzzy logic based system which interprets the forces and torques generated by a force/torque sensor mounted on a robotic wrist, and controls the execution of robotic assembly tasks. Techniques from the field of fuzzy decision making are applied to the force/torque feedback to monitor the mating of parts by a robot. An example is presented that addresses the problems of inserting a printed circuit board into a card cage. The fuzzy rules that are used to interpret the force/torque readings are discussed and the experimental results are presented.
Fast Implementation Of Standard And "Fuzzy" Binary Morphological Operations With Large, Arbitrary Structuring Elements
Frederick M. Waltz
Binary morphological processing operations have been shown to be useful in a range of industrial inspection applications. These operations have been available in two forms: As software-based operations on general-purpose algorithm development workstations, such as Bruce Batchelor's SUSIE and its many descendants, and as hardware-based operations on dedicated "morphology engines." Obstacles to wider use of these methods include • Limited familiarity with binary morphological processing techniques on the part of general users, • The slow operation of software-based implementations, and • The relatively high initial cost of fast special-purpose binary morphological processing hardware. This paper describes another alternative: A near-real-time implementation of binary morphological processing as part of a very large set of operations on a moderately-priced general-purpose image processing and algorithm development workstation. The workstation is based on commercially-available image processing boards, and provides a high-level operator interface. With this system, the speed depends on the size of the structuring element, but migration to real-time implementations using the same hardware family is straightforward. This implementation allows arbitrary specification of the final structuring element, with no constraints as to symmetry, connectedness, concavities, holes, etc. This eliminates one of the obstacles to the use of complex structuring elements on some dedicated morphological processing machines -- the need to find a set of simple "primitives" which, when applied sequentially, will yield the desired structuring element. The nature of the implementation also allows for a new possibility: By a change in one register, operation is converted from a strict binary morphology into "fuzzy" morphology (not grey-level morphology), in which the user can specify a "percentage fit" of structuring element to image. By this means, the structuring element can still be made to "fit" even in the presence of some "noise" pixels, without the need for separate noise-removal steps. This greatly increases the robustness and practicality of morphological methods in "real world" industrial inspection applications.
Hierarchical Multiprocessing Software For High Performance Robotics
Richard E. Smith
The improvements in computational performance provided by multiprocessor systems holds great potential for advanced robotics. The need for fast, tight control loops and the desire to avoid communications bottlenecks encourages the use of hierarchical processor architectures. In practice, however, hierarchical robot systems are often built as inflexible, bottom-up systems with limited flexibility in low level control. This situation is driven by engineering issues concerning modularity and performance. Advanced techniques such as cooperative multi-arm control have special requirements for both high and low level control. The challenge is to achieve the desired flexibility at a software level without sacrificing performance or engineering features. The approach described here uses data flow graphs to represent the interaction of software modules that perform computation or control tasks. The modules are written in conventional high level programming languages and then exploited for different control tasks by providing different flow graphs. Compressed representations of the flow graphs are distributed among the available processors in the hierarchy to permit optimal routing of data among the modules. This approach is being applied to the development of a two armed robot that performs cooperative control.
Raps: A Flexible Robot Programming Environment With Simulation Capabilities
M. Interesse, A. Distante
Nowadays industries need ever-increasing flexibility from automated systems, to dinamically adapt production to market requirements. In this paper a new programming environment, in the application field of factory automated cells, is presented. It allows the user to describe, at object level, the task to be performed into the work-cell and to specify the geometrical structure of manipulation objects, robots, sensors and other devices, which is stored into the Geometrical Data Base in a suitable format. This information is used by the run-time system to build up the virtual cell representation, which to check the task execution on. In this phase the user is supported by a symbolic debugger and a graphic simulator in verifying the full task efficiency. The whole software, divided into three main systems running at different times on a VAXSTATION-3200, was implemented in VAX-Pascal. At present, the environment is customized to perform tasks by means of a PUMA-560 industrial arm.
An Environment For The Development Of Sensor-Based Robot Software
Bruce Mack, Raymond Allard, Mohammed M. Bayoumi
In present robot systems, much time and effort is expended in implementing, testing and evaluating sensor processing algorithms and control schemes. This tends to reduce creativity in prototype development and may discourage modernization as technology improves. Therefore, we have devised a "Robot Controller Test Station" (RCTS). RCTS is an environment for implementing, testing and comparing novel adaptive control and sensor processing algorithms. This environment integrates a dynamics simulator and a real robot into a distributed processing environment. Unlike other test stations, the RCTS design emphasizes flexibility, portability, ease of modification and ease of use. The software design of RCTS makes extensive use of standardized signal names, state tables and clearly bounded control and communication blocks. Therefore, new interfaces to hardware and processing routines can be easily integrated into RCTS. The robot control problem is modularized into three levels with control processing separated from sensor processing. This six module structure and the prioritized communication scheme was chosen to reduce the response time of the robot to sensor data. This paper discusses the implementation of the design of RCTS and discuss its advantages for integrating sensors into the distributed robot control environment.
Applications Of The Provision Language In Robot Vision
B. G. Batchelor, I. P. Harris, F. M. Waltz, et al.
ProVision is a new language, based on Prolog, which incorporates facilities for both image processing and the control of electro-mechanical devices. A sample ProVision "program" is presented which illustrates the use of the language for identifying isolated objects (table cutlery) . In addition, the application of ProVision language to a number of different tasks in robot vision is described. These include stacking blocks, packing 2-dimensional shapes, recognition of rigid, articulated and overlapping components, and the analysis of images of overlapping transparent plates. These are all tasks which require the integration of machine vision and decision making.
A PROLOG Simulator For Studying Visual Learning
David M. W. Powers
PROLOG has proven useful as a language for research in Machine Learning, Natural Language and, more recently, as a control language for image processing and visual control of robots. A simulator has been written in PROLOG which allows simulated objects to be manipulated using the same image processing operations as are used in image processing systems. VISISIM allows examination of appropriate heuristics for learning to distinguish significant features with a level of control and a speed beyond real-time real-vision systems. It is designed such that once an acceptable performance has been obtained with the simulator, it may be replaced with another module (e.g. PROVISION) for evaluation in real time on real data. The simulator uses a convexity-based representation which allows explicit control over detail and noise, and parameterization of the image processing operators. VISISIM provides for learnt sequences to be available to the learning system as macros, whilst the learning program itself maintains heuristics about the utility of operators at the level of problem domain and context. VISISIM has been used to simulate a playing card recognition problem and find satisfactory variations of a hand crafted solution used with AUTOVIEW. This, however, was achieved using overly restrictive and unrealistic heuristics. Therefore it is proposed that a Knowledge Engineering approach be taken to the development of more realistic heuristics.
Basic Concepts For The Definition Of A Telerobot Programming Language
Bertrand Tondu
An approach is proposed for defining a telerobot programming language. Such a language is based on the possibility of combining the programming of automatic operations and teleoperated operations. Furthemore a notion of manual takeover is considered for facing the high potential occurence of incidents in the unstructured environment of the telerobot. General principles of this language and programming and execution concepts are developed.
The APx Accelerator: A High Performance, Low Cost And Compact Parallel Processor Ideal For Image Processing
E. Abreu, D. Jenkins, M. Hervin, et al.
The APx Accelerator is an SIMD Parallel Processor system designed to provide very high computing power in a PC/AT environment. The APx is an expandable system and provides from 64 to 256 16-bit processors which provide peak instruction rates from 800 to 3200 MIPs. The individual processors in the APx Accelerator are 16 bit RISC processors which are quite powerful and versatile. In addition, pairs of 16-bit processors can be configured to operate in 32-bit mode under software control. IEEE format single precision floating point operations are supported in 32-bit mode with peak ratings from 40 to 160 MFLOPs. This paper deals with the synergy of a set of architectural and implementation features that work together in the APx Accelerator to achieve high sustained system performance for a significant set of compute-intensive functions. These features have evolved to meet system-level needs, and include VLSI integration, memory bandwidth, concurrency of operations, inter-processor communications, processor selection mechanisms, and I/O bandwidth.
Intelligent Automation Using Object-Oriented, Rule-Based And Vision-Aided Control
Kirt Pulaski, P. S. Yeh, C. Ramirez, et al.
RISS (Robotic Intelligent Safety System) is a software tool for custom-building intelligent and safe execution monitoring/control systems for automatic processes. The architecture of RISS is object-oriented, its diagnostic and control knowledge is rule-based. RISS was specifically applied to intelligently control a large gantry robot operating in the vicinity of people and mobile tool fixtures. The application was vision-aided, employing two COD video cameras to monitor the robot work space and avoid obstacles.
Space And Time Requirements For Two Image Data Structures
G. A. Baraghimian, A. Klinger
We present research on tree structures using hexagonally and quarternarily organized imagery. The purpose is to determine and compare benefits from applying these pyramidal representations. Our motive is to develop new approaches to image computing. We compare space and time requirements of two hierarchical data structures. The basic image representation is by the septree and the quadtree data pyramids. A septree is a seven-descendant tree; values stored are found by decomposing a roughly-hexagonal planar region into its central hexagon and its six uniformly-adjacent neighbors. A quadtree is a four-descendant tree; its values are similarly obtained using the more common rectangular decomposition of a planar image. Today's technology (i.e., CCD arrays; VLSI chips) enables both the hexagonal and quartering tesselations to co-exist; likewise, lower-cost hardware trends encourage innovative computer systems for image analysis. Both image data structures presented here for static two-dimensional scenes can be extended to three-dimensional analogies. These can be used in computer vision models and in time-sequences of images for robots.
Reliable Communication Protocol Between Intelligent Robots
N. Zhang, K. H. Liu, W. L. Edwards, et al.
The use of a payload pointer within the synchronous transmission frame can alleviate the need for large high speed slip buffers at multiplexers and high order cross-connects. The payload pointer protocol is introduced first. Then the performance of the protocol is analyzed. Analysis of the MTBPF (Mean Time Between Pointer Failure) for the payload pointer demonstrates that a rather elegant symbolic formula can be approximated. More specifically, by adding a one bit error correcting capability to the New Data Flag enable, we found the MTBPF to be almost 1.4 times better. With the addition of a one bit er-ror correcting capability in the pointer value, we can achieve at least 0.14x(ber )-1 times improvement, if bit error rate is ≤10-6. A two bit error correcting capability in the pointer value results only in at most 8.5 times improvement over the one bit case. Based on the performance analysis and cost-effectiveness considerations, the payload pointer protocol with dual one bit error correction is very attractive. Hence it is suitable as a communication protocol between intelligent robots.
A Fast Recursive Two Dimensional Cosine Transform
Chingwo Ma
This paper presents a recursive, radix two by two, fast algorithm for computing the two dimensional discrete cosine transform (2D-DCT). The algorithm allows the generation of the next higher order 2D-DCT from four identical lower order 2D-DCT's with the structure being similar to the two dimensional fast Fourier transform (2D-FFT). As a result, the method for implementing this recursive 2D-DCT requires fewer multipliers and adders than other 2D-DCT algorithms.
Locating Known Objects In 3-D From A Single Perspective View
William J. Wolfe, Cheryl Weber-Sklair, Donald Mathis, et al.
Determining the 3-D location of an object from image-derived features, such as edges and vertices, has been a central problem for the computer vision industry since its inception. This paper reports on the use of four coplanar points (in particular, a rectangle) and three points for determining 3-D object position from a single perspective view. The four-point algorithm of Hung and Yeh is compared to the four-point algorithm of Haralick. Both methods uniquely solve the inverse perspective problem, but in different ways. The use of three points has proven to be more difficult, mainly because of multiple solutions to the inverse perspective problem as pointed out by Fischler and Bolles. This paper also presents computer simulation results that demonstrate the spatial constraints associated with these multiple solutions. These results provide the basis for discarding spurious solutions when some prior knowledge of configuration is available. Finally, the use of vertex-pairs introduced by Thompson and Mundy is analyzed and compared to the other method.
Three-Dimensional Object Recognition Using Range Data
Michael Penna, Su-Shing Chen
For the past two decades, most computer vision research has been concerned with digitized gray scale intensity images as sensor data. Digitized intensity images do not, however, contain explicit information about depth or range. More recently, digitized range data from both active and passive sensors has been used for object recognition and image understanding. In this paper we present a 3-dimensional range data recognition scheme. This scheme is based on the association of a set of geometric features to a surface which approximates the surface of an unknown object. A key feature of this system is its efficient use of dense range data.
3-D Object Recognition For Industrial Parts Identification
Chung Lin Huang
A new method to automatic recognition of the industrial parts is presented. Given images taken from different viewpoints of designated 3-D objects, a system is developed to interpret them as the same object. This study mainly involves the generation of the unique model of the viewed object from it input image, followed by the verification between the stored model and the generated model.
Pose Determination Of A Satellite Grapple Fixture Using A Wrist-Mounted Laser Range Finder
C. Merritt, C. Archibald, T. Ng
A laser range finder mounted on a robot wrist has been used to determine the position and orientation (pose) of a standard satellite grapple fixture. This sensor provides single profiles of range data. Non-adjacent parallel scanlines are acquired from a selected viewpoint, giving data which is sparse along the y-axis of the scanner, while being dense along the x-axis. An application-dependent algorithm for recognition of the grapple fixture components and its pose determination has been implemented. The method takes advantage of the sparse nature of the data, thereby requiring a minimal amount of processing.
Segmentation Of Range Images Using Simple Differential-Geometric Features
Raghu J. Krishnapuram, Anasua Munshi
Although range images are similar to intensity images in some respects, there are some important differences. For example, traditional edge operators used in intensity images can detect occluding (jump) boundaries in range images easily, but they fail to detect most edges within objects. Thus, the features one selects for segmentation of range images have to be somewhat different from those selected for intensity images. In this paper, we introduce a scalar feature based on surface normal information for segmenting range images. This feature is attractive since it converts vector information into scalar information and since it can be computed very easily using existing intensity-image edge operators. Initial results obtained with ERIM range images of blocks are presented.
Range Image Analysis Via Quadtree And Pyramid Structure Based On Surface Curvature
Hyun S. Yang
This paper presents a method which can expedite, while maintaining the accuracy, range image analysis such as range image segmentation, classification, and location of the region of interest in the range image, thereby making 3-D vision techniques based on range images more useful in manipulating robots accurately in real time. The proposed method incorporates quadtree and pyramid structure in order to quickly analyze range images. In order to make range image analysis independent of the viewing directions, surface curvatures, which are visible-invariant surface characteristics, are exploited. Specifically, we will discuss the following topics: (1) problems of using the surface curvatures for the range image analysis in the presence of noise; (2) generation of the range image pyramid; (3) reliable range image segmentation and classification via split-and-merge using the planarity test and the surface curvatures; (4) incorporation of the quadtree and the pyramid structure to speed up the projection process.
3D Object Recognition By Scale Space Feature Tracking And Subtemplate Matching
H. T. Tsui, K. C. Chu
A method to recognize 3D objects by detecting features at multiple scales and subtemplate matching is proposed. Since surface reconstruction is not required, great save in computing time is possible. Depth map data of an object is first smoothed by Gaussian filtering at the coarsest scale and the Gaussian curvature at each point is computed. Extremal points are determined and an extremal point region(EPR) associated with each extremal point is defined. A spherical window which is invariant with rotation in 3-space is used to extract a surface patch around each extremal point for subtemplate matching. Processing and subtemplate matching are repeated at next finer scale to resolve ambiguities. Since the EPRs at different scales form an organized tree, computing effort is saved at this step by applying 2D scale-space tracking to limit the search regions for extremal points. If necessary, this step may be repeated at still finer scales until the effect of noise is significant, or the finest of resolution is reached. The method is suitable for irregular-shaped object recognition and early experiments using real data are very encouraging.
Engineering Approach To Building Complete, Intelligent Beings
Rodney A. Brooks
Rather than tackle isolated aspects of Rather than tackle isolated aspects of human-level intelligence the mobile robot group at MIT has been working bottom up trying to build complete insect-level intelligent systems for mobile robots. The robots are situated in ordinary people-populated office and laboratory areas and must go about their business in an unstructured dynamically changing environment. Traditional AI techniques make such unrealistic assumptions on the perceptual and actuation systems that they are not much use for such an endeavour. We have developed a different approach, based on task achieving behaviors, rather than information processing components, as the fundamental unit of reduction of a complete intelligent system. We have built a series of complete creatures (Allen, Herbert, Tom and Jerry, Genghis, and now Seymour under construction) which exist in and interact with the world.
Image Features As Virtual Beacons For Local Navigation
Antonie J. Engel
A technique for dynamic position correction using image features as virtual beacons is described. An algorithm which acquires new features, computes robot position correction vectors from tracked features, and maintains feature reliability statistics is detailed. The algorithm minimizes the use of matching to reduce computational expense and increase robustness. The principal inputs to the algorithm are the relative bearings observed between feature pairs. Unlike stereo-vision techniques it does not compute explicit feature range estimates. Unlike the bulk of vision based navigation methods, an accurate position estimate results from the integration of a large number of correction vectors derived from the low level analysis of many images. A control architecture for an autonomous mobile robot which makes use of this positioning technique is discussed. The general navigation problem of positioning, model building, path finding, and path execution is decomposed into local and global navigation. Local navigation is independent of high level representations, it is concerned with the immediately perceivable environment and deals with the bulk of the real-time constraints. Methods for coupling local and global navigation are explored. Simulation results showing the behavior of such a control system are presented. The motivation behind this research is the belief that a substantial subset of the navigation problem can be solved using only information obtained during early vision processing. This technique is expected to be more computationally tractable than methods based on optical flow field determination and more accurate than landmark based navigation methods.
Evaluation Functions For Assembly Sequence Planning
Arthur C. Sanderson, Luiz S. Homem de Mello
Planning robotic manipulation operations is fundamental to the implementation of systems for assembly, disassembly, and repair of space-based equipment. Both teleoperated and fully-autonomous modes of operation will require the representation of feasible operations sequences which accomplish specified tasks. We have developed the AND/OR graph representation of assembly sequences and shown its completeness and correctness for assembly plan representation. The AND/OR graph partitions the state representation in a manner which facilitates local planning decisions. The use of alternative plan representations is coupled to the need for evaluation functions which guide the choice of desirable actions at any given state. In assembly planning, entropy evaluation functions provide a tool for assessment of the complexity of the current state in terms of the available degrees of freedom of motion of parts. Reducing the degrees of freedom of motion of the parts through the introduction of constraints corresponds to a decrease in entropy measures of the assembly. Assembly steps which couple the accuracy of the positioning device to the constraint clearances of the parts are selected using this measure. The entropy evaluation measure may be computed directly from attributes incorporated into a relational description of the parts geometry. The entropy measures must be coupled with other measures of manipulation complexity, availability of tools and fixtures, and speed and cost requirements for the development of general system planning systems.
Reasoning About Grasping From Task Descriptions
Huan Liu, Thea Iberall, George A. Berkey
The advent of multiple degree of freedom, dextrous robot hands has made robot hand control more complicated. Besides the existing problem of finding a suitable grasping position and ap-proach orientation, it is now necessary to decide the appropriate hand shape to use for a given task. In order to deal with this additional complexity, we focus on how to represent prehensile tasks for mapping task descriptions into suitable hand shapes, positions and orientations. A generic robot hand control system GeSAM is being implemented to refine task descriptions into suitable dextrous robot hand shapes using Knowledge Craft on a TI lisp machine.
Probabilistic Methods For Robot Motion Determination
J. Balaram
Robot motion is typically determined by searching a graph of a deterministic representation of the free configuration space available to the robot. This approach is successful only if the environment is static, problem degrees-of-freedom are low, long planning times are acceptable, and high-speed computing resources are available. In this paper an alternative approach to motion modeling is presented. The approach has its roots in a path planner developed for a multiple robot arm system operating in a complex task environment characterized by frequent environmental changes due to teleoperation and strict requirements on system response times. The probability of successful motion transit of the robot through a region of space is obtained from a geometric model and captures the conditioning effects on the motion by the presence of objects in the region and arm kinematics. This information is used to guide an on-line search and results in most cases with successful path determination within reasonably short search times. Relationships to diffusion flows and stochastic geometry are explored as are the possibilities of using sensor data directly in the model.
Neural Controller For Adaptive Sensory-Motor Coordination
Michael Kuperstein, Jorge Rubinstein
We present a theory and prototype of a neural controller called INFANT that learns sensory-motor coordination from its own experience. INFANT adapts to unforeseen changes in the geometry of the physical motor system and to the location, orientation, shape and size of objects. It can learn to accurately grasp an elongated object without any information about the geometry of the physical sensory-motor system. This new neural controller relies on the self-consistency between sensory and motor signals to achieve unsupervised learning. It is designed to be generalized for coordinating any number of sensory inputs with limbs of any number of joints. INFANT is implemented with an image processor, stereo cameras and a five degree-of freedom robot arm. Its average grasping accuracy after learning is 3% of the arm's length in position and 6 degrees in orientation.
A Procedural Approach To View Independent Three Dimensional Object Recognition And Pose Determination
Paul F. Hemler, Wesley E. Snyder
This paper presents a new approach to view independent object recognition where both the image and the model are represented conceptually as a graph structure. The model graph is implemented as a procedure, and is therefore referred to as a Procedural Model. Binary relations between nodes in the object graph are implemented implicitly by the calling relations between procedures. A collection of such procedures and the calling relations between them provides a full depth-first search of the tree of possible model node to image node bindings. A complete image analysis system was developed and used to demonstrate the functionality and usefulness of the Procedural Model approach to new independent object recognition. Although arbitrarily complex search trees can be implemented in this fashion, the syntax of the individual model procedures is shown to be so similar, for objects composed of analytic surfaces, that such models can be built automatically. An interactive program has been written which can be used to construct a Procedural Model by prompting only for object-specific information.
Practical Demonstration Of A Learning Control System For A Five-Axis Industrial Robot
Robert P. Hewes, W. Thomas Miller III
The overall complexity of many robotic control problems, and the ideal of a truly general robotic control system, have led to much discussion of the use of neural networks in robot control. This paper discusses a learning control technique which uses an extension of the CMAC network developed by Albus, and presents the results of real time control experiments which involved learning the dynamics of a 5 axis industrial robot (General Electric P-5) during high speed movements. During each control cycle, a training scheme was used to adjust the weights in the network in order to form an approximate dynamic model of the robot in appropriate regions of the control space. Simultaneously, the network was used during each control cycle to predict the actuator drives required to follow a desired trajectory, and these drives were used as feedforward terms in parallel to a fixed gain linear feedback controller. Trajectory tracking errors were found to converge to low values within a few training trials, and to be relatively insensitive to the choice of feedback control system gains.
Real-Time Neuromorphic Algorithms For Inverse Kinematics Of Redundant Manipulators
Jacob Barhen, Sandeep Gulati, Michail Zak
We present an efficient neuromorphic formulation to accurately solve the inverse kinematics problem for redundant manipulators. Our approach involves a dynamical learning procedure based on a novel formalism in neural network theory: the concept of "terminal" attractors. Topographically mapped terminal attractors are used to define a neural network whose synaptic elements can rapidly encapture the inverse kinematics transformations, and, subsequently generalize to compute joint-space coordinates required to achieve arbitrary end-effector configurations. Unlike prior neuromorphic im-plementations, this technique can also systematically exploit redundancy to optimize kinematic criteria, e.g. torque optimization. Simulations on 3-DOF and 7-DOF redundant manipulators, are used to validate our theoretical framework and illustrate its computational efficacy.
Visual Feedback For Robotic Manipulations Under Arbitrary Loading
Luis R. Lopez
An iterative series of linear and non-linear transformations can be utilized to map a user supplied position command to a set of self-adapting servo commands. Visual information from the manipulator can be mapped to a current position where it can then be used with the operator's position command to carry out an actuator command mapping. The newly generated actuator commands will change the manipulator position. Visual information about the new position completes a feedback loop that elicits an iterative chain of transformations of the visual information into control commands. Iterative transformations continue until the manipulator is within the desired position tolerance. This concept will dynamically adapt to arbitrary loads or changes in dynamical parameters. A series of transformations are the effective process of neural networks. Such processing architectures are capable of learning these transformations through exemplary inputs. A neural network system is presented that will accomplish the learning and execution of this iterative control scheme. Learning system design issues that arise in these systems are also discussed.
Experimental Evaluation Of Two Robust Control Algorithms On An Industrial Manipulator
Moshe Cohen, Laeeque K. Daneshmend
Industrial manipulators are typically controlled by proportional-derivative algorithms implemented at the individual joints. Recent theoretical results have shown that robust controllers which take into account the system dynamics can provide more accurate trajectory tracking and greater robustness to unmodeled disturbances. This research implements two such controllers - a sliding mode algorithm, and an acceleration feedback algorithm - on an industrial manipulator and evaluates their performance in comparison with a proportional-derivative algorithm implemented in the same manner. The objective of robust control designs is to make the system insensitive to disturbances and modelling uncertainties. The manipulator equations of motion are non-linear and cross-coupled, and in practice the joint actuators experience significant frictional disturbances due to high gear ratios. The robust algorithms implemented do not rely on an accurate model of the dynamics, and are computationally efficient in comparison to model-based (feedforward) controllers. Experimental results are presented for sliding-mode and acceleration feedback controllers, and these are compared with conventional proportional-derivative under the same and operating conditions. Tracking of position trajectories is shown to improve markedly when a particular sliding-mode scheme is used.