Intelligent Robots and Computer Vision X: Neural, Biological, and 3-D Methods

Volume Details

Date Published: 1 March 1992

Contents: 8 Sessions, 52 Papers, 0 Presentations

Conference: Robotics '91 1991

Volume Number: 1608

All links to SPIE Proceedings will open in the SPIE Digital Library.

Show all abstracts

View Session

Reconstruction, Description, and Modeling of 3-D Surfaces I
Reconstruction, Description, and Modeling of 3-D Surfaces II
Three-Dimensional Scene Perception I
Three-Dimensional Scene Perception II
Neural Nets for Computer Vision and Intellegent Robots I
Neural Nets for Computer Vision and Intellegent Robots II
Neuromorphology of Biological Vision: A Basis for Machine Vision I
Neuromorphology of Biological Vision: A Basis for Machine Vision II

Reconstruction, Description, and Modeling of 3-D Surfaces I

Nonreconstruction approach for road following

Daniel Raviv, Martin Herman

Show abstract

This paper presents a new approach for vision-based autonomous road following. By building on a recently developed optical-flow-based theory, we suggest that motion commands can be generated directly from a visual feature, or cue, consisting of the projection into the image of the tangent point on the edge of the road, along with the optical flow of this point. Using this visual cue, there is no need to reconstruct the 3-D scene, and the related computations are relatively simple. We suggest some vision-based partial control algorithms for both circular and non-circular roads.

Applying geometric sensor and scene models for range image understanding

Ellen L. Walker

Show abstract

For a robot to operate intelligently under sensor control in the real world, it must interpret its sensory inputs to create a model of its environment. Because the data is generally incomplete and errorful, additional knowledge must be applied to derive useful interpretations. One source of such data is knowledge about the sensors themselves; the ideal data returned from a sensor is a function of the environment and the pose and other parameters of the sensor (such as the focal length of a camera). The actual result is some corruption of the ideal result by noise or other error processes. In this work, the functions 'computed' by a range sensor are explicitly represented as geometric objects and relationships in the 3D FORM geometric reasoning system. Models for the output of independent edge segmentation and surface segmentation are described, and unified into an overall range-object model. In addition, visibility and projection relationships are used to predict the visibility of hypothesized object parts, and to refine the estimates of the cameras' positions. The resulting object description can be specialized, completed, and matched with other objects using existing 3D FORM capabilities.

Estimation of motion parameters using binocular camera configurations

Raghavan Sudhakar, Hanqi Zhuang, Padma Haliyur, et al.

Show abstract

This paper is concerned with the estimation of rotational and translational motion parameters of planar and nonplanar object surfaces viewed with a binocular camera configuration. Possible application of this method includes autonomous guidance of a moving platform (AGVs) via imaging, and segmentation of moving objects by the use of information concerning motion and structure. Assuming that the brightness of a moving patch is invariant to small motion, a pair of equation, involving the object depth and the spatio-temporal derivatives of left and right images, is developed. The depth map is estimated by matching and triangulating contours in the stereo images and interpolating over internal regions. Considering the entire image points, a system linear equations in the unknown motion parameters is obtained and solved using the singular value decomposition technique. The algorithm was tested for simulated images as well as real images. Accurate results were obtained in case of simulated images. In the case of real image, the errors obtained with different levels of image spatial sampling, were within reasonable limits.

Fusion-based depth estimation from a sequence of monocular images

Jen-yu Shieh, Hanqi Zhuang, Raghavan Sudhakar

Show abstract

This paper reports the development of a general depth estimation system directly using image sequence. We combine the direct depth estimation method with the optical flow based method. More specifically, the optical flow on or near moving edges are computed using a correlation technique. The optical flow information is then fused with gradient information to estimate depth not only on moving edges but also in internal regions. The depth estimation problem is formulated as a discrete Kalman filter problem and is solved in three stages. In the prediction stage, the depth map estimated for the current frame, together with knowledge of the camera motion, are used to predict the depth variance at each pixel in the next frame. In the estimation stage, a vector-version of Kalman filter formulation is adopted and then simplified under the assumption of a diagonized error covariance. The resulting estimation algorithm takes into account of the information from the neighboring pixels and therefore is much more robust than the scalar-version Kalman filter implementation. In the smoothing stage, morphological filtering is applied to the estimated depth map to reduce the measurement noise and fill in the untrustable areas based on the error covariance information. Simulation results illustrate the effectiveness of the presented method.

New, fast, and accurate reconstruction method of 3-D planes

Patrick J. Bonnin, Bertrand Zavidovique

Show abstract

This paper describes a 3-D reconstruction method of polyhedric objects (planar faces) that uses planarity constraints. Not very often used on papers, because of its global nature, we propose a new global/local implementation of these constraints. It is detailed in the context of our edge region cooperative segmentation. Conceptually, the advantages of this method are its simplicity, a precise knowledge of its uncertainties, and at the operational level its rapidity and the preciseness of the obtained volumetric reconstruction.

Reconstruction, Description, and Modeling of 3-D Surfaces II

Scene description: interactive computation of stability with friction

Jeff L. DeCurtins, Prasanna G. Mulgaonkar

Show abstract

This paper describes a technique for hypothesizing the shape of hidden portions of unknown objects within a pile of such objects, using a dense range image of the pile. The technique employs symmetry, stability, viewpoint independence, and object impenetrability to hypothesize the unknown shape and dimension of each visible object. The process constructs alternative hypotheses, which differ in the way the visible portions of objects are extended into the occluded regions within the scene. To ensure that each interpretation is consistent with the observed range data, the known geometry of the range sensor is used in forming the hypotheses. The final result is one or more hypothesized object configurations, each of which is consistent with both the sensed range data and the physical constraints between objects in contact. For each resulting hypothesis, a free-body analysis is performed to determine if the hypothesized configuration is stable. The hypothesis with the highest stability rating is chosen as the most likely correct interpretation.

Three-dimensional object representation based on the largest convex patches method

Stoyanka D. Zlateva, Lucia M. Vaina

Show abstract

We introduce a representation of the three-dimensional (3-D) shape of objects which describes object shape through configurations of interrelated parts and accounts for their surface and volumetric properties as well. The decomposition into constituent parts is obtained by an earlier developed surfaced based method that uses the largest locally convex patches (LCP) and the largest nonconvex patches to characterize and define part boundaries. This characterization of the part boundary provides the basis for assigning to each part a simple volumetric primitive that preserves its surface type. We propose a heuristics for choosing the volumetric description motivated by a theorem from differential geometry which classifies surface points (elliptic, hyperbolic, parabolic, plane) through the type of the parabolloid that approximates a finite environment. This provides a natural way of relating surface properties of the boundary-based decomposition to volumetric properties of the primitive-based part description. A further decomposition into subparts and the computation of associated features on an 'if needed' basis allow us to account for structural details which appear to be closely related to the use of the object in actions. We present examples of the shape descriptions of various machine tools.

Labeling scheme for surface detection in 3-D images

Prakash Adiseshan, Tracy L. Faber

Show abstract

This work addresses the problem of finding surfaces in 3-D images. This problem involves segmentation and parameterization. The solution method we propose is designed to answer the following questions: (1) Is a point (x,y,z) in 3-space a surface point? (2) If so, what is the surface normal at that point, and (3) What are the neighboring voxels in the u and v directions on the surface? We use a labeling scheme to define a surface element, at a point, in (u, v) directions. By restricting the neighborhood volume at a point, we have a small set of labels in u and v directions. A cost functional is set up, in terms of surface normal and labels at a point, following the calculus of variations methodology. Estimation of surface normal at a point depends on normals at neighboring points and label connectivity criteria defined. This also facilitates handling of surfaces that are not smoothly varying. The final solution is found by finding the set of normal and labels at every point that minimize the cost functional. Finally, we apply our method to find surfaces in 3-D synthetic and medical image data.

Building a surface model of an object using multiple range views

Marc-Andre A. Soucy, Denis Laurendeau

Show abstract

This paper presents a two-step technique to build a non-redundant surface model of an object using information provided by N registered range views. In the first processing step, the redundancy between views (e.g., 3-D information belonging to more than one view) is detected. Range views are considered as sets of 3-D points. The redundancy between the views is modeled by the Venn diagram of the N sets. Using the Venn diagram, the second processing step yields a non-redundant surface model of the object. A mesh of 3-D triangles defined in an object-centered reference frame describes the surface information present in the N range views. The proposed view integration technique is independent of the number of 3-D views and does not impose orthogonallity between them.

Trinocular vision: a 3-D solution

Flavio Bortolozzi, Bernard Dubuisson

Show abstract

This paper presents a solution for the following problems: matching between trinocular images, full reconstruction of 3-D segments without the length constraint, elimination of false triplets. In a first step, cameras are calibrated. This provides high precision perspective transformation matrices and defines the geometry of the system. This approach is used to help the perception of a mobile robot. Our approach for matching is a geometry-based one.

Trinocular correspondence for particles and streaks

Joseph K. Kearney

Show abstract

This paper examines the problem of matching corresponding object points in three views of a scene. The matching problem is a critical step in the recovery of 3-dimensional position by stereo image processing and has important application in the analysis of particle distributions and fluid flows. We introduce an algorithms that iteratively searches for all possible pairwise matches using trinocular consistency constraints. The algorithm is shown to be equivalent to a search for edges that belong to all perfect matchings in a bipartite graph that links consistent matches between two views. Three such problems are solved in parallel. An attractive property of the matching algorithm is its graceful degradation in response to image distortion and modeling error. Although measurement errors may prevent successful matching, wrong matches can almost always be avoided if the error in the image position of a particle can be bounded. Thus, noise can cause a loss of acuity but should not cause the introduction of gross misinformation that could result from incorrect matching. Experiments with synthetic particle images demonstrate that this approach can lead to significantly greater numbers of matches than algorithms that match in a single direction.

Regular curved object's CSG-rep reconstruction from a single 2-D line drawing

Weidong Wang

Show abstract

In the 1989 SPIE conference on Intelligent Robots and Computer Vision, we presented our approach and algorithm to automatically identify the primitives that are involved in constructing a 3-D polyhedral solid. We have since extended our research to introduce more complex primitives. In this paper, we will present an algorithm to reconstruct from a single 2- D line drawing the CSG-Rep of a regular curved object, which is generated from a solid modeler with primitive blocks and cylinders. We will demonstrate the difficulties that arise from the introduction of a curved object, the cylinder. We assume that the line drawing of a 3- D solid object comes from an orthographic projection of the object from a general view point, that it is perfect, and that it has been labeled with some labeling scheme. We identify the primitives (blocks and cylinders) one by one, find their local to world coordinate transformations, and determine the regularized set operations for the CSG representation.

Algorithmic analysis of 3-D depth information for dynamic visual images

Dezong Wang, Deng-feng Sheng

Show abstract

The world is motion. Human visual images are dynamic. From a couple of dynamic images, the outer orientation elements can be obtained. It is absolute orientation. On the basis of that and two images on image frame coordinate, the points of body world coordinate can be seeked out. That is space meet. So that the 3-D depth information are acquired. The current presentation will describe a method which provided outer orientation elements, and two ways acquiring 3-D depth information.

Computer vision: reconstructing 3-D model from 2-D images

Zheng Tan, Long Gong

Show abstract

A comprehensive algorithm for the reconstruction of a three-dimensional model from a pair of two-dimensional images is presented in this paper. The dynamic programming matching based on the 'bridging method' are guided by the results of feature-based matching. In the process of feature-based matching, the mathematic morphology is introduced to describe the feature extracted out. In those areas without distinct feature points, area-based correlation is applied to get the fine location of the corresponding points. At last, the constraints in two directions are introduced to check out the mismatched points.

Monocular pose estimation of quadrics of revolution

Massimo Ferri, Fulvia Mangili

Show abstract

The ability to determine an object's position accurately and quickly is important in many robotics tasks. Monocular scene analysis based on perspective projection can be successfully used to solve this problem if a-priori knowledge on objects is available. In this paper analytic procedures for perspective inversion of quadrics of revolution--in particular for spheres, cones and cylinders--are presented. Preliminary experimental results on real images of a test object are provided, with the main goal to test the procedures accuracy and the suitability of the available low-level processing.

Three-Dimensional Scene Perception I

Parallel range data processing: a real case study

John C. Sluder, Mongi A. Abidi

Show abstract

In this paper, we begin by describing a range image processing method developed to allow an autonomous robot to detect and locate objects. The procedure consists of taking a range image, filtering and preprocessing the image, calculating surface normals, obtaining edge maps from both range and surface normals, combining the edge maps into an initial scene segmentation map, and analyzing each object in the scene in order to detect and locate the desired objects. While this method is reasonably fast and robust, its surface characterization is fairly crude. More elaborate surface characterization on a serial computer would be considerably slower, which is not desirable for autonomous robot operations. We then discuss the development of parallel techniques for surface characterization using range data. By using parallel processing, the complexity and accuracy of the characterization can be increased without an unacceptable cost in processing time. The method explored is a parallel implementation of a least squares QR surface fitting technique using Givens transformations. We conclude by summarizing the work done and briefly listing some future extensions of the current research.

Hough transform for 3-D object recognition

Xing Ping Cao, Gongzhu Hu

Show abstract

This paper describes an approach to the recognition of objects of known shapes by matching, in the Hough space, the features (edges and normal vectors) extracted from range images to object models. The matching is carried out in hierarchy. First, a few candidate rotation matrices are determined based on the peaks in the Hough accumulator that represents the orientation of a reference vector. Then the translation parameters are evaluated for each rotation matrix, and the transformation is obtained based on the best interpretation between the model and the scene. The experiments indicate that the procedure is accurate, although optimization can further improve the estimates of the parameters.

Planning of an active range sensor structure for pose estimation of 3-D regular objects

Janusz A. Marszalec, Markku Jarviluoma, Tapio A. Heikkila

Show abstract

In some 3-D object perception and pose estimation tasks an active range sensor structure is important for obtaining high accuracy results. In this paper an active triangulation photoelectric range sensor based on the electronic scanning of an integrated LED array is described and the sensor structure planning for pose estimation of 3-D regular objects is discussed. A simulation of the effect on the accuracy of a 3-D object's pose estimation of the location of the measured points on the object's surface constitute the background for the planning. Taking into consideration the results of the theoretical analysis, an approach to design the desired sensor for pose estimation of a certain set of objects is presented. The whole sensor structure planning process is illustrated with an example of pose estimation of cylindrical objects. Benefits of the approach presented to a certain class of pose estimation tasks, which are met in real industrial environments, include decreasing the costs of the range data acquisition hardware and the time consumed for range data reduction.

Improving the robustness of edge- and region-based range image segmentation

Visa Koivunen, Matti Pietikaeinen

Show abstract

In our previous work, we presented a segmentation method that combines useful properties of edge and region-based segmentation. In the region-based approach, pixels are classified into 10 surface types according to the spatial properties in the neighborhood of each pixel. Surface differential properties are approximated using least squares estimation. Geometrically coherent regions are formed by grouping connected pixels of the same surface type. A two stage method which detects both step and roof edges is used for edge detection. Preliminary edge and region-based segmentation results are overlaid to achieve the final segmentation. This paper presents our recent results which improve the robustness of the segmentation method. Accurate estimation of the differential properties of the surfaces is essential if one is to gain good segmentation. The least squares estimation with constant coefficient window operators gives good results when only white Gaussian distributed noise occurs, and pixels in the neighborhood are from one statistical population. In order to decrease the influence of very deviant pixel values that occur near region boundaries or due to noise, we implemented two robust estimation methods. One is iterative reweighting least squares method that uses a variable order model and the other is a least trimmed square method. The robust and least squares approaches are compared and their effects on surface classification are reported. Also the validity of the assumptions on the data, model and estimation methods used are considered. Both synthetic and real range images are used for test images.

Intersection mappings for stereo vision

Joseph H. Nurre

Show abstract

Stereo vision triangulation methods are used for three-dimensional measurements in conjunction with robot guidance and machine inspection applications. Understanding the computational geometry associated with stereo imaging, gives insight into the accuracy that can be obtained and into the surface types best inspected by such systems. The purpose of this paper is to discuss the shape of the triangulation space arising from stereo imaging for flexible stereo camera configurations. It is shown that the points of intersecting views arising from a stereo pinhole configuration form conic sections, which are a function of the disparity between the pixels in the two images. The significance of this work is that it provides a basis for improved quantitative and qualitative understanding of stereo vision.

Three-Dimensional Scene Perception II

Fast 2-D Hartley transform in 3-D object representation and recognition

Dah-Jye Lee, Manuel Ramirez, Sunanda Mitra

Show abstract

In image processing or computer vision, Fourier transform is widely used for frequency- domain analysis. However, Hartley transform can be a very good substitute for the more commonly used Fourier transform when the real input data are concerned. A two-dimensional butterfly algorithm for fast Fourier transform has been modified to calculate the Hartley transform faster than could be done using row-column decomposition. This paper presents three different frequency-domain registration techniques, power cepstrum, complex cepstrum and phase correlation. These techniques not only are capable of precise registration of images but also lead to three-dimensional (3-D) reconstruction of real objects by finding the corresponding points and disparities of an image pair. Use of these recently developed techniques allows one to obtain a precise displacement between two images and a quantitative measurement of 3-D information in a relatively faster computation time. Hartley transform can be used to implement all of these three techniques instead of using complex number computation required by Fourier transform. An additional 35 percent saving of the computation time is achieved by implementing the two-dimensional butterfly algorithm for computing Hartley transform. This reduction in computation time makes the use of Hartley transform in frequency-domain analysis more attractive.

Stereo-based 3-D scene interpretation using semantic nets

Juha Roening, Tapio Taipale

Show abstract

A SCene INterpreter system, SCIN, for indoor scenes has been developed as a part of an autonomous navigation project at the University of Oulu. Starting from passive stereo, SCIN gradually forms a scene graph utilizing semantic net representation. In the graph formation, bottom-up and top-down approaches are combined with the exploitation of a priori information from the environment. The interpretation is done by matching model graphs to the scene graph, and gradually refining the scene graph towards a higher level symbolic representation. The refinement is done by balancing between replacement and an addition of a new node. The model graphs represent different kinds of entities the system is supposed to find from a scene. The complexity of the used model graphs grows as the analysis proceeds. This paper describes the structure of the scene graph, and the model graphs, and the benefits of this representation. The matching and refining of the scene graph are also explained. Experimental results using a robot arm and an indoor robot vehicle are presented to verify the operation of the interpreter.

Clustering methods for removing outliers from vision-based range estimates

Bassam Hussien, Raymond E. Suorsa

Show abstract

The automation of rotorcraft low-altitude flight presents challenging problems in flight control and sensor systems. The currently explored approach uses one or more passive sensors, such as a television camera, to extract environmental obstacle information. Obstacle imagery can be processed using a variety of computer vision techniques to produce a time-varying map of range to obstacles in the sensor's field of view along the helicopter flight path. To maneuver in tight space, obstacle-avoidance methods would need very reliable range map information by which to guide the helicopter through the environment. In general, most low level computer vision techniques generate sparse range maps which include at least a small percentage of bad estimates (outliers). This paper examines two related techniques which can be used to eliminate outliers from a sparse range map. Each method clusters sparse range map information into different spatial classes relying on a segmented and labeled image to help in spatial classification within the image plane.

Entropy consistency principle in binocular stereopsis

Yu Zhao

Show abstract

This paper presents a new theoretical entropy formalism to stereo vision--an entropy view of stereopsis. At the heart of this formalism is the entropy consistency principle. It reflects the essential 'active' and 'dynamic' nature of binocular stereo vision process. Finally, an experimental stereo system based on mathematical morphology is proposed and is integrated into the entropy formalism presented in this paper.

Line matching in stereo vision

Charles E. Coldwell Jr., Jack Fitzer

Show abstract

The problem of reconstructing real-world positions of three-dimensional objects from stereo views has proved very difficult, because the problem is inherently underdetermined. By reducing the problem to a one-line problem, and using precision image acquisition hardware, range information may be obtained with relatively simple correlation calculations.

Stereo pair design for cameras with a fovea

Samir R. Chettri, Michael Keefe, John R. Zimmerman

Show abstract

In this paper we describe the methodology for the design and selection of a stereo pair when the cameras have a greater concentration of sensing elements in the center of the image plane (fovea). Binocular vision is important for the purpose of depth estimation, which in turn is important in a variety of applications such as gaging and autonomous vehicle guidance. Thus, proper design of a stereo pair is essential if we are to be successful in these areas. In this paper we assume that one camera has square pixels of size dv and the other has pixels of size rdv, where 0 < r

Location of polyhedral objects in 3-D space from three unconstrained edge points

Horst Bunke, Hong-Seh Lim, Urs Meier

Show abstract

We present algorithms for locating polyhedral objects using three edge points. An edge point is a three-dimensional data point which lies on a known edge of an object. In this paper, we assume that the precise position of an edge point on an edge is not known. In general, with three edge points, we have to solve a system of nonlinear equations. However, there are special cases, where we can solve the location problem analytically. The special cases depend on the collinearity and coplanarity of the three edge points.

Finding cones from multiscan range maps

Shao Lejun, Richard A. Volz

Show abstract

Two algorithms are presented to locate a cone with a known shape from a range scan map. They are ellipse fitting and three-scan methods. The localization problem includes the finding of both cone's orientation and displacement relative to the sensing coordinate system. The work described in this paper is an extension to the problem of extracting the cylinder parameters from a depth scan map. The range data are obtained from either a light-stripe based range sensor or a line range sensor, which measures the distance between the sensor and the nearest points on the intersection of the light plane and the scene. The three-scan method uses range data from three scan lines of measurements. It does not use any ellipse fitting. This method first extracts the orientation parameters of at least three cone's generators and then derive the axis direction vector from the cone's generators. The performance of this method is compared with the usual ellipse fitting method.

Three-dimensional transformation recognition using four-dimensional tensor theory

Luis de Pedro

Show abstract

A new approach to identify the three-dimensional transformation is presented based in four- dimensional tensor theory. It allows the isolation of the 3-D transformation parameters using data from real perspective projection of a rigid planar path object. With this approach there is no need of point correspondence between the image and the reference position of the object. It uses perspective projection modeling of the image and, thus there are no unresolved angles as in the case of parallel projection.

Neural Nets for Computer Vision and Intellegent Robots I

Techniques for high-performance analog neural networks

David P. Casasent, Leonard Neiberg, Sanjay S. Natarajan

Show abstract

We consider analog neural network implementations (using VLSI or optical technologies) with limited accuracy and various noise and nonlinearity error sources. Algorithms and techniques to achieve high performance (good recognition P'_c% and large storage capacity) on such systems are considered. The adaptive clustering neural net (ACNN) and robust Ho-Kashyap (HK-2) associative processor (AP) are the neural networks considered in detail.

Optical image segmentation using wavelet correlation

Steven D. Pinski, Steven K. Rogers, Dennis W. Ruck, et al.

Show abstract

This research introduces an optical method of segmenting potential targets using wavelet analysis. An optical Harr wavelet is created using a magneto-optic spatial light modulator (MOSLM). Two methods of controlling wavelet dilation are explored: (1) an aperture positioned in front of a binary modulated MOSLM; (2) spatial filtering of a ternary modulated MOSLM. Segmentation is performed through Vander Lugt correlation of a binarized image with the optical wavelet. Frequency-plane masks for the correlation process are generated using thermal holography.

Hybrid ANN-ES architecture for automatic target recognition

Chungte Teng, Panos A. Ligomenides

Show abstract

Automatic target recognition can benefit from cooperation of artificial neural networks (ANNs) and expert systems (ESs). Bottom-up training and generalization properties of artificial neural networks, and top-down utilization of accumulated knowledge by expert system processors, can be combined to offer robust performance of the automatic target recognition models. In this paper, we propose a modular, flexible and expandable, hybrid architecture which provides cooperative, functional and operational interfaces between expert system and artificial neural networks facilities. In order to make the problem more specific, we apply this architecture to the Multline Optical Character Reader (MLOCR) system, which is being developed to sort the postal mail pieces automatically.

Handwritten digit recognition using neural networks

Amanda Bischoff, Patrick S. P. Wang

Show abstract

Character and handwriting recognition is one of the most difficult problems of pattern recognition and artificial intelligence. Unlike the machine generated character, which is uniform throughout a document and often uniform between machines, each human being has a unique style of writing characters. With the infinite number of ways to record a character, it is a wonder that a person can understand his own script, let alone the script of another. Training a computer to recognize human-produced characters is a tremendous task in which researchers are just beginning to achieve some success. Primarily these methods rely on the use of algorithms to determine the similarities of two characters. Neural networks are an alternative technique now being explored. Four separate methods will be discussed in this paper. The first involves normalization, skeletonization, and feature extraction of a handwritten digit before application to a neural network for classification. The second simply applies a normalized digit to the neural net's input, and the network performs a 2-dimensional convolution on it in order to classify the digit. The third method involves a hierarchical network. The final technique incorporates time information into the system while using simple preprocessing and a small number of parameters. Their advantages and disadvantages are compared and discussed.

Neural Nets for Computer Vision and Intellegent Robots II

Efficient activation functions for the back-propagation neural network

Surender K. Kenue

Show abstract

The back-propagation algorithm is the most common algorithm in use in artificial neural network research. The standard activation (transfer) function is the logistic function s(x) equals 1/(1 + exp(-x)). The derivative of this function is used in correcting the error signals for updating the coefficients of the network. The maximum value of the derivative is only 0.25, which yields slow convergence. A new family of activation functions is proposed, whose derivatives belong to Sechn (x) family for n equals 1,2,.... The maximum value of the derivatives varies from 0.637 to 1.875 for n equals 1-6, and thus a member of the activation function-family can be selected to suit the problem. Results of using this family of activation functions show orders of magnitude savings in computation. A discrete version of these functions is also proposed for efficient implementation. For the parity 8 problem with 16 hidden units, the new activation function f3 uses 300 epochs for learning when compared to 500,000 epochs used by the standard activation function.

Neural-network-based motion stereo methods

Yi-Tong Zhou

Show abstract

This paper presents neural network based lateral and longitudinal motion stereo methods. Lateral motion stereo infers depth information from a lateral motion. Existing lateral motion stereo algorithms use either a Kalman filter or recursive least square algorithm to update the disparity values. Due to the unmeasurable estimation error, the estimated disparity values at each recursion are unreliable, yielding a noisy disparity field. Instead of updating the disparity values, we recursively update the bias inputs of the network. The disparity field is then computed by using a neural network. Since the recursive algorithm implements the matching algorithm only once, and the bias input updating scheme can be accomplished in real time, a vision system employing such an algorithm is feasible. For the purpose of handling batch data, we have also designed a batch algorithm. The batch algorithm integrates information from all images by embedding them into the bias inputs of the network. Then a static matching procedure is used to compute the disparity values. Longitudinal motion stereo infers depth information from a forward or backward motion. Existing longitudinal stereo algorithms have some problems associated with the location of the focus of expansion (FOE), and with the camera and surface orientations. Instead, our approach allows the camera to move along its optical axis forward or backward, requires no information on the FOE, and makes no assumption about the object surface. The algorithm uses a Gabor correlation operator to extract image features and employs the neural network to compute the disparity field based on the Gabor features. It produces multiple dense disparity fields and recovers the depth map very efficiently.

Three-dimensional monocular pose measurement using computational neural networks

H. Joe Sommer III, Radha Krishnan

Show abstract

Experimental measurement of position and attitude (pose) of a rigid target using machine vision is of particular importance to autonomous robotic manipulation. Traditionally, the monocular four-point pose problem has been used which encompasses three distinct subproblems: inverse perspective; calibration of internal camera parameters; and knowledge of the pose of the camera (external camera parameters). To this end, a new unified concept for monocular pose measurement using computational neural networks has been developed which obviates the need to estimate camera parameters and which provides rapid solution of inverse perspective with compensation for nonhomogeneous lens distortion. Input neurons are (x, y) image coordinates for target landmarks. Output neurons are (X, Y, Z, roll, pitch, yaw) target position and attitude relative to an external reference frame. Modified back-propagation has been used to train the neural network using both synthetic and experimental training sets for comparison to current four-point pose methods. Recommendations are provided for number of neural layers, number of neurons per layer, and richness versus breadth of pose training sets.

Model-based edge position and orientation measurement using neural networks

Hiroshi Naruse, Mitsuhiro Tateda, Atsushi Ide

Show abstract

This paper proposes a new model fit type edge feature measurement method. In this new method, an accurate edge model, which explains well the practical edge gray level patterns in an actually observed image, is made by considering the point spread function in the image recording process as well as the edge features, that is, edge position and orientation. This method consists of two preparation steps and a measurement step. Step 1: Gray level patterns with various edge features values are generated on an edge pixel and its surrounding pixels based on this model. Step 2: The gray levels are fed, as teaching signals, into error back propagation type neural networks with a 3-layer structure. The mapping parameters used to determine the edge features are obtained from the gray level patterns. Step 3: The edge features are calculated by feeding the gray levels in an observed image into the networks after learning. Experimental results proved that this method can determine edge position and orientation with a high accuracy of 0.07 pixels and 0.8 degree(s), respectively.

Neural network modeling of new energy function for stereo matching

Jun Jae Lee, Seok Je Cho, Yeong-Ho Ha

Show abstract

In vision research, most problems can be modeled as minimizing an energy function. Particularly, stereo matching can be viewed as one of the optimization problems in which the constraints must be satisfied simultaneously. Neural networks have been demonstrated to be very effective in computing these problems. In this paper, an approach to solve the stereo matching problem using the neural network with a new energy function is presented. The new energy function is derived not only to satisfy three constraints of similarity, smoothness, and uniqueness, but also to ensure Hopfield's convergence rules of symmetrical interconnection strength without self-feedback. Experimental results shows good stereo matching for sparse random dot stereograms and real images.

Piecewise quadratic neural network for pattern classification (Proceedings Only)

Sanjay S. Natarajan, David P. Casasent

Show abstract

A neural network pattern classifier is presented. Its decision boundaries are formed from segments of conic sections which allows it to achieve improved performance over piecewise linear neural network classifiers, such as our earlier adaptive clustering neural network (ACNN). We discuss an optical realization that uses complex-valued weights, optical intensity detectors, and an additional input neuron to achieve piecewise conic decision surfaces (rather than the piecewise linear surfaces that the ACNN produces).

Hierarchical neural networks for edge preservation and restoration

Si Wei Lu, Anthony Szeto

Show abstract

In the paper, a hierarchical neural network system is designed to adjust edge measurements based on the information provided by neighboring edges. The local edge pattern is analyzed to determine and reinforce edge structures while suppressing unwanted noise and false edges. The neural network is made up of four levels of subnets. The subnet in the first level determines the potential adjustment on the element of interest by detecting edge contours according to the selected processes in the neural nets and the input local edge pattern. The second level consists of a cooperative-competitive neural net model to find the orientation of the strongest edge contour in the local edge pattern. The subnet in the third level ascertains the conditions for adjusting the gradient magnitude and determines the amount of adjustment to the gradient magnitude, and calculates the new adjusted gradient magnitude and determines if the element of interest is to be an edge element or a non-edge element. The subnet in level four is a semilinear feedforward net which is used to assign the new orientation for the element of interest. An iterative approach incorporated into the neural network system has also enabled the application of global analysis in the process of adjusting the edge measurements.

Artificial neural network models for texture classification via the radon transform

Arun D. Kulkarni, P. Byars

Show abstract

Texture is generally recognized as being fundamental to perception. A taxonomy of problems encountered within the context of texture analysis could be that of classification/discrimination, description, and segmentation. In this paper we suggest a novel artificial neural network (ANN) architecture for features extraction and texture recognition. There is evidence which suggests that the analysis of stimulus by visual system might involve a set of quasi-independent mechanisms called channels which could be conveniently characterized in the spatial frequency domain. In our model we use an FT feature space with angular and radial bins that characterize spatial domain filters to extract features. The extracted features are then used as input for the recognition stage. In order to evaluate the 2-D FT coefficients we use the Radon transform. The usage of the Radon transform simplifies the ANN model significantly. We suggest an electronic implementation of the ANN model for feature extraction, using a Connected Network Adaptive ProcessorS (CNAPS) chip designed by Adaptive Solutions Inc. We also develop software to simulate the ANN model with the Radon transform. We use a three stage back-propagation network as a classifier. We have used ten different texture patterns to test our ANN model.

Texture operator determination by simulated annealing

Bradley Pryor Kjell, Pearl Y. Wang

Show abstract

Convolution of textured images with a set of small masks can be used to produce features for texture classification or for image segmentation. These masks are usually picked from a standard set, such as Laws' texture energy operators or the variations discussed by other researchers. In this paper we discuss using simulated annealing to determine an optimum set of masks for particular sets of textures. Initial masks are picked and iteratively improved by using simulated annealing with an appropriate energy function. The masks converge to a final set that minimizes the energy function. We show experimental results with a small set of textures. The masks produced by the method depends on the textures used, and these masks are more effective in discriminating between these textures than the standard masks.

Recognition of a translational pulse in noise

Michael E. Parten, Yee-man Kwan, Mustafa Ulutas, et al.

Show abstract

One of the basic problems in pattern recognition is the detection of a pattern in noise. This problem becomes particularly difficult if the pattern varies in position and size. A system necessary to achieve this result can be modeled in a number of different ways. One currently popular approach is to use a neural network.(1,2) The advantage to using a neural network is that once the basic structure is assumed the characteristics of the network, described by it's weights, can then be learned. The learning or training process involves developing a training set of known inputs and outputs for the system and adapting the internal weights of the network so that the inputs will yield the desired outputs. The weights are adjusted to minimize the error, according to some criteria, between the actual outputs and the desired outputs. Most neural networks are composed of first order terms, that is, z = f{ w0 + wij xj } where xj are the inputs, z are the first level (or hidden) outputs, w are weight terms and the functional relationship is normally a sigmoid function for inputs between zero and one. Usually, there are at least two levels of this type. In other words, the output, yi, would be given by yk = f{ u0 + uik Zj } where y are the final outputs, u are the weights and the other terms are as before. This type of network is trained using a back-propagation technique. Neural networks offer hope in the possible solution of detecting an object in noise by proper training of the network to recognize the characteristics of the object and ignoring the noise. Unfortunately, most neural networks cannot be trained to detect an object that appears in different positions. In other words, most neural networks are not translationally invariant. However, some special higher order neural networks have been shown to posses translational invariance

Effect of normalized interconnect matrix on the performance of Hopfield neural network

Shaoping Bian, Jie Li, Kebin Xu

Show abstract

It is shown by theoretical analysis and numerical simulations that there are some senses of reduction of the convergence property when the multi-valued matrix is substituted by a normalized matrix. Also, an optical experimental implementation scheme of Hopfield neural network which uses a novel multi-valued synaptic interconnect matrix realized by a combined system of LCLV and CRT controlled by a microcomputer is proposed.

Neuromorphology of Biological Vision: A Basis for Machine Vision I

Neuronal morphology of biological vision: a basis for machine vision (Proceedings Only)

Madan M. Gupta

Show abstract

In this paper, we are concerned with the study of the biological vision system and the emulation of some of its mathematical functions, in both the retinal and visual cortex, for the development of a robust computer vision system. This field of research is not only intriguing, but offers a great challenge to systems scientists in the development of functional algorithms. These functional algorithms can be generalized for further studies for fields such as signal processing, control systems and image processing. Our studies are heavily dependent on the use of neuronal layers and the generalized receptive fields. Building blocks of such neuronal layers and receptive fields may lead to the design of better sensors and better computer vision systems. It is hoped that these studies will lead to the development of better artificial vision systems with applications to vision prosthesis for visually impaired persons, robotic vision, medical imaging, medical sensors, industrial automation, remote sensing, space stations and ocean exploration.

Early perception and structural identity: neural implementation

Panos A. Ligomenides

Show abstract

It is suggested that there exists a minimal set of rules for the perceptual composition of the unending variety of spatio-temporal patterns in our perceptual world. Driven by perceptual discernment of "sudden change" and "unexpectedness", these rules specify conditions (such as co-linearity and virtual continuation) for perceptual grouping and for recursive compositions of perceptual "modalities" and "signatures". Beginning with a smallset of primitive perceptual elements, selected contextually at some relevant level of abstraction, perceptual compositions can graduate to an unlimited variety of spatiotemporal signatures, scenes and activities. Local discernible elements, often perceptually ambiguous by themselves, may be integrated into spatiotemporal compositions, which generate unambiguous perceptual separations between "figure" and "ground". The definition of computational algorithms for the effective instantiation of the rules of perceptual grouping remains a principal problem. In this paper we present our approach for solving the problem of perceptual recognition within the confines of one-D variational profiles. More specifically, concerning "early" (pre-attentive) recognition, we define the "structural identity of a k-norm, k ∈ K,"--SkID--as a tool for discerning and locating the instantiation of spatiotemporal objects or events. The SkID profile also serves a s a reference coordinate framework for the "perceptual focusing of attention" and the eventual assessment of resemblance. Neural network implementations of pre-attentive and attentive recognition are also discussed briefly. Our principles are exemplified by application to one-D perceptual profiles, which allows simplicity of definitions and of the rules of perceptual composition.

Analog model of early visual processing: contour and boundary detection in the retina

Lisa Dron

Show abstract

Biological and psychophysical data do not rule out a model of retinal processing which allows accurate localization of contours without loss of high frequency image features such as corners and junctions. From an engineering perspective, early contour detection is useful for several applications, among them visually guided camera-image control. Simple, accurate and fast algorithms have been developed for matching based on binary edge maps. These can be incorporated within a control system for self-focusing and image stabilization. The principal features of the analog model are the following: (1) It is robust with respect to noise; (2) It does not lose contrast information or introduce systematic errors; and (3) It allows independent control of thresholding and smoothing. In other words, the interdependence between smoothing and the accuracy of edge localization is removed. The model is implemented by a 2-dimensional network of cells which are connected to their nearest neighbors. We present simulations of the network on a test image and present arguments for its plausibility as a model of retinal processing.

Neurovision processor for designing intelligent sensors

Madan M. Gupta, George K. Knopf

Show abstract

A programmable multi-task neuro-vision processor, called the Positive-Negative (PN) neural processor, is proposed as a plausible hardware mechanism for constructing robust multi-task vision sensors. The computational operations performed by the PN neural processor are loosely based on the neural activity fields exhibited by certain nervous tissue layers situated in the brain. The neuro-vision processor can be programmed to generate diverse dynamic behavior that may be used for spatio-temporal stabilization (STS), short-term visual memory (STVM), spatio-temporal filtering (STF) and pulse frequency modulation (PFM). A multi- functional vision sensor that performs a variety of information processing operations on time- varying two-dimensional sensory images can be constructed from a parallel and hierarchical structure of numerous individually programmed PN neural processors.

Spectral imaging by optical computing

Jarmo Hallikainen, Jussi P. S. Parkkinen, Timo Jaeaeskelaeinen

Show abstract

The science of imaging and its applications touch many aspects of physics because the response of materials to light and other forms of radiation yields insights into some of the most fundamental properties of the matter. In this paper we describe and investigate two acousto- optic (AO) implementations for color classification of digital images by optical calculation of vector inner products. The proposed AO implementations may have importance in developed of artificial eyes for machine vision and in quality inspection problems.

Neuromorphology of Biological Vision: A Basis for Machine Vision II

Toward a pyramidal neural network system for stereo fusion

Richard Lepage, Denis Poussart

Show abstract

A goal of computer vision is the construction of scene descriptions based on information extracted from one or more 2-D images. Stereo is one of the strategies used to recover 3-D information from two images. Intensity edges in the images correspond mostly to characteristic features in the 3-D scene and the stereo module attempt to match corresponding features in the two images. Edge detection makes explicit important information about the two-dimensional image but is scale-dependent: edges are visible only over a range of scales. One needs multiple scale analysis of the input image in order to have a complete description of the edges. We propose a compact pyramidal architecture for image representation at multiple spatial scales. A simple Processing Element (PE) is allocated at each pixel location at each level of the pyramid. A dense network of weighted links between each PE and PEs underneath is programmed to generate the levels of the pyramid. Lateral weighted links within a level compute edge localization and intensity gradient. Feedback between successive levels is used to reinforce and refine the position of true edges. A fusion channel matches the two edge channels to output a disparity map of the observed scene.

Single instruction computer architecture and its application in image processing

Phillip A. Laplante

Show abstract

A single processing computer system using only half-adder circuits is described. In addition, it is shown that only a single hard-wired instruction is needed in the control unit to obtain a complete instruction set for this general purpose computer. Such a system has several advantages. First it is intrinsically a RISC machine--in fact the 'ultimate RISC' machine. Second, because only a single type of logic element is employed the entire computer system can be easily realized on a single, highly integrated chip. Finally, due to the homogeneous nature of the computer's logic elements, the computer has possible implementations as an optical or chemical machine. This in turn suggests possible paradigms for neural computing and artificial intelligence. After showing how we can implement a full-adder, min, max and other operations using the half-adder, we use an array of such full-adders to implement the dilation operation for two black and white images. Next we implement the erosion operation of two black and white images using a relative complement function and the properties of erosion and dilation. This approach was inspired by papers by van der Poel in which a single instruction is used to furnish a complete set of general purpose instructions and by Bohm- Jacopini where it is shown that any problem can be solved using a Turing machine with one entry and one exit.

Model for spatial and chromatic vision

Jussi P. S. Parkkinen, Madan M. Gupta, George K. Knopf, et al.

Show abstract

An artificial vision system with spatio-chromatic channels is proposed. A dynamic neural network is used for the spatial and chromatic information of a scene. The spatio-chromatic information is transmitted into two channels for processing. This segmentation allows accurate spatial and chromatic analysis of the visual input. For both channels, models based on the biology of the visual system are used. Spatial channel responses simulate, e.g., enhanced edges and subjective contours. Chromatic channel output is shown to correspond to the color characteristics found in the spectral color tests and in the literature of the physiology of color vision. The ultimate goal of the project is to find a biologically motivated model for an intelligent image sensor. In this report we describe potential candidates for both spatial and chromatic information.

Intelligent Robots and Computer Vision X: Neural, Biological, and 3-D Methods

Volume Details

Table of Contents

Table of Contents