Proceedings Volume 1825

Intelligent Robots and Computer Vision XI: Algorithms, Techniques, and Active Vision

David P. Casasent
cover
Proceedings Volume 1825

Intelligent Robots and Computer Vision XI: Algorithms, Techniques, and Active Vision

David P. Casasent
View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 1 November 1992
Contents: 8 Sessions, 71 Papers, 0 Presentations
Conference: Applications in Optical Science and Engineering 1992
Volume Number: 1825

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Pattern Recognition in Computer Vision
  • Computer Vision
  • Locating Edges, Lines, Curves, and Surfaces in Robotic Vision
  • Segmentation, Motion, and Color Techniques
  • Morphological Processing for Intelligent Robotics
  • Sensory Robotics and Control (Vision, Collision Avoidance, Path Planning)
  • Visual Servoing in Automated Manufacturing
  • Active Vision
Pattern Recognition in Computer Vision
icon_mobile_dropdown
New advances in correlation filters
David P. Casasent
We consider new correlation filters for all levels of scene analysis such as clutter reduction and object detection, recognition, and identification. A hierarchical inference set of such filters allows scene analysis on a unified multifunctional correlator architecture. It has applications in robotics, computer vision, optical character recognition, reconnaissance, and target recognition. Present digital processors can now achieve correlations in real time and hence such filters are of importance.
Three-dimensional articulated object recognition: a parallel approach
Patrick S. P. Wang
This paper deals with articulated objects, which are semi-deformable in that they can change shapes in part while maintaining each portion of the object rigid. The characteristics of articulated objects are more complicated than rigid ones. Representing and recognizing such objects by computers is more difficult. We propose a heuristic parallel method using the concept of coordinated graph, layered graph representation, and parallel pattern matching. The method is simple but robust, needs very few learning samples and can distinguish similar objects which are not distinguishable by other methods. It can be applied to a variety of interesting 3-D line-drawing objects for recognition, understanding, and description. Several illustrative examples are given in learning, representing, recognizing and describing states of various articulated objects.
Three-dimensional object recognition using average surface normal detection
Peter Y. Hsu, Anthony P. Reeves
A direction characterization of a surface region in a range image, called the average surface normal (ASN), is presented in this paper. The goal is to robustly detect a direction for each range image pixel that is object centered and reasonably insensitive to the viewpoint. The average surface normal for a range image pixel is defined as the normal to that best-fitting plane of a local region of the image that surrounds the pixel. An analysis of the noise sensitivity of the ASN is presented and the bias due to Gaussian noise on a plane surface is determined. Results from empirical experiments are presented that confirm that the ASN operator has a very small bias in the presence of noise. The use of the ASN to aid three- dimensional object identification is considered. A three-dimensional object may be decomposed into a number of similar sized detectable surface regions each of which defines an object feature. Object identification is achieved by detecting a visible subset of these features.
Recognition of containers using a multidimensional pattern classifier
Michael Magee, Richard Weniger, Dennis J. Wenzel, et al.
A method for recognizing closed containers based on features extracted from their circular tops is presented. The approach developed consists of obtaining images from two spatially separated cameras that utilize both diffuse and specular light sources. The images thus obtained are used to segment target objects from the background and to extract representative features. The features utilized consist of container height as computed using stereopsis as well as the mean, variance, and second central moments of the intensities of the segmented caps. The recognition procedure is based on a minimum distance Mahalanobis classifier which takes feature covariance into account. The discussion that follows details the algorithmic approach for the entire system including image acquisition, object segmentation, feature extraction, and pattern classification. Result of test runs involving sets of several hundred training samples and untrained samples are presented.
Profile feature extraction via the Walsh transform for face recognition
Jia Xiaoquang, Mark S. Nixon
This paper describes a measure of the facial profile feature from a frontal view of the face for automatic face recognition. This method is part of a program of research aimed to develop an extended feature set for face recognition. The profile is derived from the intensity projection of the face image and is described using the Walsh power spectrum which was chosen as the feature descriptor from six other descriptors including the Fourier transform according to its ability to distinguish the differences between profiles of different faces. The method has been assessed by applying it to face images of different subjects and to different images of the same person where the face was rotated from side-to-side and up-and-down, or where the lighting varied slightly. The results of this description have been analyzed using two different measures and each shows that this profile feature represented by the Walsh power spectrum can be used to indicate identity and the difference between faces.
Real-time mobile robot navigation is possible with off-the-shelf hardware
David J. Michael, Michael Bolotski
We describe a real-time vision contest that took place in January 1992 at the MIT AI lab. The task was high speed visual navigation along a 60 foot winding indoor course. The computational power available was a conventional Sun Sparcstation 1 with a Sun color framegrabber. The imaging device was a standard color Pulnix CCD camera with an auto iris 4.8 mm lens. The robot base was a commercial B12 mobile robot from Real World Interface. The winning entry completed the course in 1:07 minutes about three times faster than a human controlling the robot using the same video input. Approximately ten person days of work was required to program an entry that would complete the course.
Pattern recognition using Hilbert space
Ying Liu
In this paper, we develop a new learning approach, the Hilbert learning. This approach is similar to Fractal learning, but the Fractal part is replaced by Hilbert space. Like the Fractal learning, the first stage is to encode an image to a small vector in the internal space of a learning system. The next stage is to quantize the internal parameter space. The internal space of a Hilbert learning system is defined as follows: A pattern can be interpreted as a representation of a vector in a Hilbert space. Any vectors in a Hilbert space can be expanded. If a vector happens to be in a subspace of a Hilbert space where the dimension L of the subspace is low (order of 10), the vector can be specified by its norm, an L-vector, and the Hermitian operator which spans the Hilbert space. This establishes a mapping from an image space to the internal space P. This mapping converts an input image to a 4-tuple: t (epsilon) P equals (Norm, T, N, L-vector), where T is an operator parameter space, N is a set of integers which specifies the boundary condition. The encoding is implemented by mapping an input pattern into a point in its internal space. We assume that a system uses local search algorithm, i.e., the system adjusts its internal data locally. The search is first conducted for an operator in a parameter space of operators, then an error function (delta) (t) is computed. The algorithm stops at a local minimum of (delta) (t). Finally, the input training set divides the internal space by a quantization procedure.
Three-dimensional grating optical retinal chip and stimulus-adaptive robotic vision
A cellular multilayer phase grating with hexagonal closest packing proves to be the ideal focal plane architecture for the human eye, and is thus also the best model for designing stimulus- adaptive robot eyes which achieve the spatial and chromatic performance of the human eye. Crystal-optical calculation of the retinal cellular multilayer chip and the resulting correlations between the physical stimulus parameters and the adaptive shifts in human vision at the retinal level give rise to a time-frequency diagram of the eye and its stimulus-adaptive latitudes, which will become relevant in the design of future chips for robot eyes with performance comparable to that of human vision. The current presentation shows that 3-D grating optical parameters ensure the frequency-related chromatic adaptive shifts (transition from photopic to scotopic vision in the Purkinje shift, Stiles-Crawford effects I/II, Bezold-Bruecke phenomenon, chromatic adaptation to artificial light sources of different spectral composition, etc.) and also indicates what 3-D grating optical parameters are relevant to spatial transfer and adaptation, i.e., the time-related aspects in the time-frequency diagram (adaptation of the spatial modulation transfer function to the image parameters; log term for spatial adaptation to the intensity level; coding of spatial phase relationships between a fundamental spatial frequency and higher frequencies up to the third harmonic, etc.).
Comparative study of moment invariants for perspective transformation
Shan Yu, Adam Klette, Charles Olinger, et al.
Various moment invariants have been developed for object recognition applications. In this paper, we conduct a comparative study of different moment invariants with respect to perspective transformation. Perspective transformations are induced by the lens of the human eye or an optical system. A physical model of the perspective transformation is given. concepts of Zernike moments and regular moments are discussed and their derivation compared. Their corresponding rotational, scale, and translational invariance properties are presented. A group of quadratic form perspective moment invariants is also introduced. Comparisons and estimations are made on the performance and properties of these invariants in object recognition from both a theoretical and a computational point of view. After the overall estimation, it is shown that the most effective way in recognizing object under perspective transformation is to use quadratic form invariants.
Novel approach to object recognition by the fusion method
Yong-Qing Cheng, Yong-Ge Wu, Ke Liu, et al.
Object recognition depends mainly on extracting the optimal features which should be insensitive to image translation, scaling, rotation, and noise. However, it is a complicated and difficult process, and stable features may not be extracted in some cases. In this paper, a novel approach to object recognition based on a single sensor is proposed in the view of data fusion and Dempster-Shafer's theory. The notions of image subfeature and similar degree function (SDF) are first introduced. For each SDF function, we further establish a set of subordinate functions (SF). The SDF and SF are combined in the fusion model. For each class of training samples, several subfeatures are selected by the different methods. Then, the SDF and a set of SF functions are calculated. Finally, the Dempster's rule of combination is used and all subfeatures are fused. In the fusion model, a simple classifier is designed to recognize objects. Experimental results show that the proposed method is efficient and our recognition model has good performance.
FFT look-up table for image processing
Muzhir Shaban Mohammed, Santiago Lorenzo, Luis L. Nozal, et al.
This paper deals with the implementation of a fast method to develop a Fourier transform applied in the field of image processing. The design of the system depends on controlling the flow of input data corresponding to the data of sines and cosines according to the look up tables. The system gives a good advantage of a fast method as well as the software and hardware implementation.
Computer Vision
icon_mobile_dropdown
'We do dishes, but we don't do windows': function-based modeling and recognition of rigid objects
Melanie A. Sutton, Louise Stark, Kevin W. Bowyer
Generic recognition for computer vision is a goal that is still far from reality. Part of the problem rests in the inherent limitations of current `model-based' vision. Our approach moves away from specific geometric or structural models and instead focuses on the functionality of the object as the property which drives the recognition process. This results in a representation that is generic in the sense of capturing an entire category of objects. One important assumption underlying the form and function approach is that a `small' number of `primitive' concepts about shape, physics, and causation will suffice to define the functionality of a broad range of categories. If multiple new `primitives' were required to define each additional category, then much of the advantages of the function-based approach over the traditional model-based approach would be lost. This paper presents some initial experimental results from the GRUFF-3 system, which uses function-based representation to recognize rigid objects in the superordinate category dishes. The performance of this system has been evaluated on a database of approximately 200 shapes.
Interweaving reason, action, and perception
Claude L. Fennema Jr.
In an attempt to understand and emulate intelligent behavior Artificial Intelligence researchers have, for the most part, taken a reductionist approach and divided their investigation into separate studies of reason, perception, and action. As a consequence, intelligent robots have been constructed using a coarse grained architecture; reasoning, perception, and action have been implemented as separate modules that interact infrequently. This paper describes an investigation into the effect of reducing this architecture granularity on the computational efficiency of the overall system. It demonstrates that introducing a fine grained integration or `interweaving' of these functions can result in significant complexity reduction. This paper introduces the `reason a little, move a little, look a little,' or RML paradigm, describes an RML navigation system, and discusses analytical and experimental results that quantify complexity reduction for planning and vision. The system details illustrate novel approaches to representation, planning, and vision. The environment is represented as a network that provides mechanisms for coping with positional uncertainty and focusing reasoning activities. Plans are constructed in three dimensions using a geometry-induced hierarchical decomposition. The approach to vision takes its lead from the way a blind man uses his cane: to verity that reason is consistent with reality.
Looking near one object for another
Lambert E. Wixson
In order to search for an object as efficiently as possible, it can be very useful to take advantage of the spatial relationships in which it commonly participates. Searches that do so, which we call indirect searches, can be modeled as two-stage processes that first find an intermediate object that commonly participates in a spatial relationship for the target object, and then look for the target in the restricted region specified by this relationship. Using this model, previous work has determined that over a wide range of situations, for searches that involve rotating a camera about a fixed location, indirect searches improve efficiency by factors of 2 to 8. However, several areas require future research if indirect search is to become a widely applicable easily usable technique. This paper describes three major areas in need of study -- recognizing typical intermediate objects, defining spatial relationships, and constructing mechanisms for moving a camera to examine cluttered regions of space. It concentrates especially on the latter topic, discussing many issues arising in the design of systems for looking around clutter and occlusions.
Comparing subset-convergent and variable-depth local search on perspective-sensitive landmark recognition problems
J. Ross Beveridge
An intelligent robot with a camera and a partial model of its environment should be able to determine where it is from what it sees. This goal, landmark based navigation, can be realized using geometric object recognition algorithms. An important problem that arises in the development of such algorithms concerns the role of full 3-D perspective projection. Much of the work on object recognition has focused upon simplified problems which are essentially 2- D. One such simplification uses weak-perspective: to test the alignment of matched features object models are rotated, translated, and scaled in the image plane. At increased computational cost, full-perspective can be incorporated into recognition using a family of probabilistic optimization procedures based upon local search. This paper considers two specific algorithms from this family: subset-convergent and variable-depth local search. Both approaches reliably recognize landmarks even when landmark appearance is sensitive to perspective. Results presented here suggest the relatively simpler variable-depth algorithm is preferable when errors in the robot pose estimate are smaller, but that at some point as uncertainty in the initial pose estimate increases the more sophisticated subset-convergent algorithm becomes preferable.
Eye for design: why, where, and how to look for causal structure in visual scenes
Matthew Brand
Before addressing the problem of visual recognition, we need to understand what the result of visual cognition is: What is new in memory after a scene has been understood? For an agent that is to interact with the scene, the most important result of visual understanding is an analysis of the causal structure of the scene: How motion is originated, constrained, and prevented, and what will happen in the immediate future. With respect to the agent's goals, such an understanding describes the scene in terms of its functional properties -- how the agent may interact with the scene. In order to arrive at such an understanding, a robot must have a sophisticated theory of how the world is designed. We discuss some of the consequences of this view for the construction of purposeful vision systems, and show examples from our own work in the understanding of complex scenes.
How I handled ambiguity in a system to read music scores
Alan Ruttenberg
In a program for reading printed music, a variety of low level feature detectors were used to extract sufficient information to reconstruct the score. All feature detectors were unreliable to some extent, and were biased towards yielding false positives rather than missing features. In order to reconstruct the score, conflicting information from the feature detectors needed to be recognized and eliminated. All objects as well as their geometric and semantic relations were represented in an object oriented framework. Ambiguity (implemented as a generic predicate) was defined -- and explicitly represented -- in terms of these relationships. Examples of ambiguous relationships include: An accidental and a note head having an on-top-of geometric relationship, or the total duration of notes in a measure not being equal to the notated time signature. A method inspired by Waltz filtering was used to produce a consistent, unambiguous interpretation. Waltz filtering is a symbolic constraint propagation technique which has been applied to line drawings. During interpretation attention was focused on objects which had ambiguous relations. Ambiguity was iteratively reduced or removed by using a variety of methods employing information gathered from local unambiguous relations.
Disparity filtering: proximity detection and segmentation
David Coombs, Ian Horswill, Peter von Kaenel
Simple stereo disparity filters can provide `proximity detectors' shaped like concave shells in front of the observer. Ideally, these are isodisparity surfaces. In practice, a narrowly tuned filter results in a thin shell. The special case of the zero-disparity surface is called the horopter. A disparity filter can also be useful for distinguishing an object that lies on an isodisparity surface from its surroundings. These filters are much less expensive than stereographic scene interpretation since they are local operations. Similarly, they are also less general. We analyze the expected proximity sensitivity of one simple version of the disparity filter and compare this to its empirical performance. We also present some feature based and correlation based disparity filters and compare their `segmentation' performance on various scenes.
Depth reversal in binocular vision with symmetrical convergence
Chang-Ming Sun, Andrew K. Forrest
Binocular vision is the coordinated behavior of the two eyes by which a single perception of the external world is obtained and by which, the specific sensation of stereoscopic depth perception, is made possible. This perception, however, can be reversed by interchanging the left- and right-eye views. In this paper, the mathematical expression of the Vieth-Mueller circle is derived. A point on the line of the primary direction is found which only relates to the convergence angle and the interocular distance. A relation is developed between the position of a point in real space and its reversal if viewed pseudoscopically. It is shown that in some circumstances a concave surface is not necessarily perceived as a convex surface under pseudoscopic viewing conditions. The difference in perceiving real objects and stereograms is briefly discussed.
Locating Edges, Lines, Curves, and Surfaces in Robotic Vision
icon_mobile_dropdown
Approach for description and recognition of arbitrary curves based on segment code
Hongjian Liu, Jiangchuan Du, Wu Xian, et al.
Quantity is the integration of discretion and continuity. Starting with the definition of segment code, a basic data structure in which continuity is achieved through discrete series, this paper surveys a simple method by which arbitrary curve is represented by an integral series. Also, the application of four-vertex theorem and even number theorem of simple closed curve in pattern recognition is explained through the process of obtaining graphic characteristics of handwriting digits 0 - 9. Finally, the fundamental role of segment code in Pan-Euclidean geometry and the future of its development are discussed.
Corner detector for 3D object recognition
Andrew K. C. Wong, Qigang Gao
This paper presents an effective corner detector based on perceptual organization and a curve tracking scheme. The detector first finds 2-D corners among the curve partitioning points. It then locates 3-D corners by detecting the terminations of tracked curves intersecting at 2-D corners. It then assigns an attribute value according to its perceptual structure to each detected corner. The corner detector is a very important image analysis component in both 2-D and 3- D vision systems. Experimental results demonstrate its effectiveness and robustness.
Hierarchical edge detector using the first- and second-derivative operators
Ralf Schuster, Nirwan Ansari, Ali R. Bani-Hashemi
This paper presents a new method for edge detection by combining the Sobel operator and the Laplacian of Gaussian (LoG) operator. The underlying idea is to combine the advantages of both operators and eliminate their disadvantages. Different methods of combining the two operators are considered. One method yields promising results in precise and blur free edge detection, and good stability versus image noise. A method for designing the LoG kernel is also presented.
Extracting a symbolic object description from gray-level images using a Kalman-filter-based contour tracer combined with a complex edge-operator
Markus Becker, Dietmar Ley
Usually, gray-level images are arranged as two-dimensional (NxM)-matrices. Tracing its contours is a common way to obtain information about an imaged object, such as position and orientation, or for purposes of object recognition. This paper describes the generation of contour-based object descriptions by edge-detection and contour-tracing. A complex differential operator is used to detect edges in the image. In addition to the gradient, also the local orientation of edges can be computed with an accuracy of approximately 5 degree(s). This edge-oriented description, which is still arranged as a two-dimensional matrix, occupies twice as much memory as the original gray-level image (gradient plus orientation) and there is no knowledge about the course of the contour. In addition to that, in most cases this edge-oriented image is fragmentary, due to illumination restrictions and shades. For this reason the imaged contour is traced using a Kalman-filter-based algorithm. The contour tracer connects and completes these edge fragments. The algorithm is able to follow the course of a contour without any prior knowledge, even if its direction changes erratically. It has been tested successfully in several applications in industrial production testing (for example for controlling an optical range sensor of a 3-D-measurement system in assembly lines).
Boundary-based shape normalization technique
Jia-Guu Leu
When a camera's center axis is not parallel to the surface normal of a planar object, the perceived shape of the object will be skewed. Most existing shape analysis methods are sensitive to such shape distortions. That is, if a shape is skewed due to non-orthographic projection, it may not be correctly recognized. In this paper we present a shape normalization process which neutralizes such shape skewing effects. In our case, a perceived shape is represented by a list of corner points along its boundary. We first compute the six lower moments of the shape from its boundary. Then these moments are used to compute the shape's center location, orientation, and maximum and minimum moments of inertia. To normalize the shape we first translate the shape's center to the origin. Next we rotate the shape to align its major axis with the x-axis. Then we expand the shape along its minor axis to neutralize shape skewing. Lastly, we scale the size of the shape so it has a standard moment of inertia. The suggested method can be used as a preprocessing step for any planar shape analysis method which is sensitive to shape skewing, shape size change, and/or shape translation. Since the moments are computed from the shape's boundary instead of from all its interior pixels, the method is also efficient. Several experimental results are given to show the effectiveness of the approach.
Cooperative algorithm for 2D facets extraction using contours and regions
E. Zagrouba, Charlie J. Krey
The aim of the method presented in this paper is to extract 2-D structural information, called `2-D facets,' in order to compute the 3-D corresponding surfaces. The information obtained by the algorithm is more complete than that obtained by the analysis of characteristic points or segments. The method presented relies on simultaneous analysis of regions and contours. The algorithm first matches the regions of the two images, then a subset of the boundaries of the homologous regions are matched. Finally, the contours detected by an edge extractor are used to improve the previous matching.
Object recognition using an efficient technique for aligning quadric surfaces
Nicolas Alvertos, Ivan D'Cunha
Pose and orientation of an object are central issues in 3-D recognition problems. Most of today's available techniques require considerable pre-processing, such as detecting edges or joints, fitting curves or surfaces to segment images, and trying to extract higher order features from the input images. In this paper we present a method based on analytical geometry, whereby all the rotation parameters of any quadric surface are determined and subsequently eliminated. This procedure is iterative in nature and has been found to converge to the desired results in as few as three iterations. The approach enables us to position the quadric surface in a desired coordinate system, then, utilize the presented shape information to explicitly represent and recognize the 3-D surface. Experiments were conducted with simulated data for objects such as hyperboloid of one and two sheets, elliptic and hyperbolic paraboloid, elliptic and hyperbolic cylinders, ellipsoids, and quadric cones. Real data of quadric cones and cylinders were also utilized. Both of these sets yielded excellent results.
Operator for object recognition and scene analysis by estimation of set occupancy with noisy and incomplete data sets
S. John Rees, Bryan F. Jones
Once feature extraction has occurred in a processed image, the recognition problem becomes one of defining a set of features which maps sufficiently well onto one of the defined shape/object models to permit a claimed recognition. This process is usually handled by aggregating features until a large enough weighting is obtained to claim membership, or an adequate number of located features are matched to the reference set. A requirement has existed for an operator or measure capable of a more direct assessment of membership/occupancy between feature sets, particularly where the feature sets may be defective representations. Such feature set errors may be caused by noise, by overlapping of objects, and by partial obscuration of features. These problems occur at the point of acquisition: repairing the data would then assume a priori knowledge of the solution. The technique described in this paper offers a set theoretical measure for partial occupancy defined in terms of the set of minimum additions to permit full occupancy and the set of locations of occupancy if such additions are made. As is shown, this technique permits recognition of partial feature sets with quantifiable degrees of uncertainty. A solution to the problems of obscuration and overlapping is therefore available.
Improvement of algorithm about corner detection in object feature extraction
Guoliang Sun
An improved algorithm for corner detection in object feature extraction is presented. Due to the introduction of a novel linear transformation, the feature of the chain-code can be applied more reasonably, not only could the position of the corner by detected precisely, but the threshold can be selected easily.
Segmentation, Motion, and Color Techniques
icon_mobile_dropdown
Comparison of massively parallel hand-print segmentors
R. Allen Wilkinson, Michael D. Garris
NIST has developed a massively parallel hand-print recognition system that allows components to be interchanged. Using this system, three different character segmentation algorithms have been developed and studied. They are blob coloring, histogramming, and a hybrid of the two. The blob coloring method uses connected components to isolate characters. The histogramming method locates linear spaces, which may be slanted, to segment characters. The hybrid method is an augmented histogramming method that incorporates statistically adaptive rules to decide when a histogrammed item is too large and applies blob coloring to further segment the difficult item. The hardware configuration is a serial host computer with a 1024 processor SIMD machine attached to it. The data used in this comparison is `NIST Special Database 1' which contains 2100 forms from different writers where each form contains 130 digit characters distributed across 28 fields. This gives a potential 273,000 characters to be segmented. Running the massively parallel system across the 2100 forms, blob coloring required 2.1 seconds per form with an accuracy of 97.5%, histogramming required 14.4 seconds with an accuracy of 95.3%, and the hybrid method required 13.2 seconds with an accuracy of 95.4%. The results of this comparison show that the blob coloring method on a SIMD architecture is superior.
Object segmentation techniques for use in laboratory visual automation systems
Peter Eggleston
In designing automated systems for interpretation or manipulation of laboratory image data such as that derived from microphotographs, it is often the goal to perform operations that extract information about the structure of objects, and to separate and discern various objects within the data. Measurements of the events, called features, can then be calculated and used for process or statistical analysis. Given a transformation of the pixel based image data into an explicit symbolic representation of the objects (i.e., the creation of objects of interest or Tokens), desired information can be extracted and characterized from the visual data. Simple segmentation schemes often lack the sophistication to deal with intricate or very subtle details of this image data. This paper discusses advanced techniques useful in obtaining information relevant to the recognition and extraction of objects of interest in laboratory vision automation applications.
Effects of the clique-potentials on maximum a posteriori region segmentation
Aly A. Farag, Yuen-Pin Yeap, Edward J. Delp III
In this paper we review image modeling using Gibbs-Markov random fields and empirically investigate the effects of clique potentials in the Gibbs-Markov models when used in maximum a posteriori (MAP) segmentation of textured images.
Multiresolution segmentation of range images based on Bayesian decision theory
This paper describes recent work on hierarchical segmentation of range images. The algorithm starts with an initial partition of small planar regions using a robust fitting method constrained by the detection of depth and orientation discontinuities. From this initial partition represented by an adjacency graph structure, we optimally group these regions into larger and larger regions until an approximation limit is reached. The algorithm uses Bayesian decision theory to determine the local optimal grouping and the geometrical complexity of the approximation surface. This algorithm produces a hierarchical structure that can be used to represent objects with a varying level of detail by scanning through the hierarchical structure generated. Experimental results are presented.
Automated system trend monitoring based on time series analysis
Scott A. Starks, Jaime L. Mercado
Trend monitoring is an important task performed by flight controllers in support of space missions. Effective means of trend monitoring enable the forecast of future problems with a system or subsystem prior to their actual occurrence. For future space missions, automated approaches to trend monitoring are essential in order to provide for cost effectiveness and system maintainability and reliability. In this paper, a prototype is introduced which shows promise as an aid to trend monitoring. The approach is based upon concepts from the field of time series analysis.
Incorporating color and spatiotemporal stereovision techniques for road following
Eric Sung, Toe Myint
This paper proposes a method of implementing stereovision in existing mobile robot perception systems without having to solve the classical stereo correspondence problem. The proposed method makes use of the fact that tracking of features is an easier task than performing stereo correspondence. A two-view motion stereo system is implemented in which the correspondence between two cameras is found by temporal stereo tracking. The algorithm assumes an initial stereo match of homologous features seen in the two cameras but subsequently the algorithm reliably tracks each feature by use of accurate odometric information, and accurate 3-D information can be derived from each matched feature. The derived set of 3-D data (of points, lines, and regions) are then fed to a fast surface reconstruction algorithm for later stage interpretation. The proposed scheme can be added to existing mobile robot road following systems at little added cost.
Color image enhancement based on modified IHS coordinate system
Jeong Yeop Kim, Jae-Chang Shim, Yeong-Ho Ha
Color image enhancement is a technique which makes an image more vivid for human vision. Most affecting color elements for the image are intensity, contrast, and saturation. In handling these color elements with the conventional coordinates, their geometric form is important in view of valid gamut. Of them, IHS coordinate appropriately represents human color perception and it is easy to manipulate hue, intensity, and saturation of the image. The geometric form of this coordinate is, however, nonlinear, so that it is difficult to control the element values since they may exceed the valid gamut. In this paper, a modified IHS coordinate system is proposed to remedy the nonlinearity of the IHS system. The proposed coordinate is derived to linearize the relationship between the saturation and intensity. To improve the image quality, contrast is increased by maximizing the dynamic range of intensity, and saturation is normalized in full range of the intensity because the ratio between the changed saturation values should be maintained as the same ratio between original values. Hue is preserved to keep the characteristic property of the color image. This coordinate system is easy for enhancing color images and avoiding the gamut-overflow problem.
Using self-organizing recognition as a mechanism for rejecting segmentation errors
R. Allen Wilkinson, Charles L. Wilson
We have developed a method for self-organized neural network based segmentation error checking and character recognition. In this method the page is segmented, a pre-trained self- organizing classifier is used to classify the characters coming from the segmenter and these reclassified characters are used to adaptively learn the machine print font being segmented. This allows the self organizing character classifier to perform font and image quality checking while independently checking for segmentation errors. No context is used to correct segmentation or recognition errors since the page was randomly generated. In our experiments segmentation errors caused the sequential classification of segmented characters to be confused for 6.6% of the items segmented because of the splitting and merging of characters. By using self-organizing neural network classification 9.2% of these errors were corrected to produce a correct segmentation and classification rate of 93.4% overall. After self-organization correction, segmentation and classification was done to an accuracy of 99.3% with no human intervention and in all cases 99% of all segmentation errors were detected.
Algorithm for dynamic object tracking
Mihai P. Datcu, Florin Folta, Cristian E. Toma
The purpose of this paper is to present a hierarchic processor architecture for the tracking of moving objects. Two goals are envisaged: the definition of a moving window for the target tracking, and multiresolution segmentation needed for scale independent target recognition. Memory windows in single processor systems obtained by software methods are limited in speed for high complexity images. In a multiprocessor system the limitation arises in bus or memory bottleneck. Highly concurrent system architectures have been studied and implemented as crossbar bus systems, multiple buses systems, or hypercube structures. Because of the complexity of these architectures and considering the particularities of image signals we suggest a hierarchic architecture that reduces the number of connections preserving the flexibility and which is well adapted for multiresolution algorithm implementations. The hierarchy is a quadtree. The solution is in using switched bus and block memory partition (granular image memory organization). To organize such an architecture in the first stage, the moving objects are identified in the camera field and the adequate windows are defined. The system is reorganized such as the computing power is concentrated in these windows. Image segmentation and motion prediction are accomplished. Motion parameters are interpreted to adapt the windows and to dynamically reorganize the system. The estimation of the motion parameters is done over low resolution images (top of the pyramid). Multiresolution image representation has been introduced for picture transmission and for scene analysis. The pyramidal implementation was elaborated for the evaluation of the image details at various scales. The multiresolution pyramid is obtained by low pass filtering and subsampling the intermediate result. The technique is applied over a limited range of scale. The multiresolution representations, as a consequence, are close to scale invariance. In the mean time image representation by wavelets allow scale to be implicit, that is why the wavelet transform is well adapted to evaluate the self similarity of the signals. The self similarity is the common point of wavelets and fractal signals. It is assumed that an image (signal) has fractal behavior if at several scales its `features'' show deterministic or statistical self-similarity or self-affinity. Texture analysis can be accomplished by fractal transform: each pixel of the original image is substituted by the value of the fractal dimension in its neighborhood. To evaluate the fractal dimension several techniques have been developed. It is necessary to compute the dimension of the set of points in the neighborhood of interest for a given range of resolutions. The slope of the approximated straight line in log/log plot of these values versus the unit of each scale is in linear dependence to the fractal dimension. It has been proven that fractal dimension can be evaluated from the ratio of the energies of the detail images in a multiresolution pyramid obtained by wavelet transform.
Determining composition of grain mixtures using texture energy operators
Bradley Pryor Kjell
Images of texture may be convolved with a set of small operators to produce texture energy features for classification or for image segmentation. These operators are usually picked from a standard set, such as Laws' texture energy operators or the variations discussed by other researchers. In this paper, texture energy features are used to determine the percentage composition of mixtures of rice and barley. The grains of white rice and pearled barley are similar in size and reflectivity, and hundreds of overlapping grains appear in each image. This problem is representative of many visual inspection tasks. Two approaches are used: multi- linear regression, and linear classification into discrete composition classes. The texture energy features used are standard Laws' operators in two sizes, and operators found through a stochastic optimization procedure.
Motion perception at equiluminance and the consequences for computational vision systems
George Lee Zimmerman, Viet Nguyen
When the image of a moving object is equal in luminance with the background, we observe a startling change in both its apparent motion and its three-dimensional position in space. If we use biological vision as a guide for the construction of machine vision systems, this perceptual phenomenon has profound implications. Motion information can be used in a variety of visual tasks such as detection, calibration, guided movement, navigation, and recognition. Human performance at equiluminance suggests that navigation uses motion information heavily and that for recognition, motion plays only a role such as separating figure from ground or grossly defining surface in space. Equiluminant motion perception cannot tell us much about detection, calibration, or guided movement tasks. We demonstrate an adaptive model of motion perception which presents similar equiluminant responses.
Morphological Processing for Intelligent Robotics
icon_mobile_dropdown
Fuzzy morphological filters
The theory of fuzzy mathematical morphology is well-developed, including the characterization of fuzzy Minkowski algebra and extensions of the basic Matheron representation theorems. The present paper provides a natural paradigm for the lifting of crisp-set binary filters to fuzzy filters. The paradigm is based on considering gray-scale realizations of binary images as [0,1]-valued fuzzy images and then processing them in a manner compatible with their interpretation as fuzzy binary images. Various filters are implemented in the paper: smoothing, edge detection, peak detection, and object detection.
Optical morphological processors: gray scale with binary structuring elements, detection, and clutter reduction
Roland H. Schaefer, David P. Casasent, Anqi Ye
We consider morphological processing for clutter reduction and object detection. For detection, we compare a binary and gray-scale Hit-Miss Transform and find that the binary operator is preferable. For clutter reduction, we find gray-scale morphology to be preferable. We present a new gray-scale clutter reduction morphological algorithm for low clutter cases and a new algorithm for high clutter cases. In all morphological processing, we find binary structuring elements to be adequate; this is very attractive for our gray-scale morphology decomposition algorithm and its optical implementation.
Automated recognition and precise estimation of orientation of objects for industrial applications
Curt L. Orbert, Ewert W. Bengtsson, Bo G. Nordin
We present a solution to a common problem in industrial machine vision, to identify and estimate the orientation of touching mechanical parts on a plane surface. The algorithm is based on watershed segmentation and can handle cases where objects touch. After an initial thresholding step, we extract the edges of the binary image, the outer edge as well as edges around holes inside the object. Then we use a distance transformation to create a distance map, i.e., an image where each pixel value represents the distance to the nearest edge pixel. The watershed algorithm is applied on the distance map and we get an image where some objects may be segmented into several parts. For every segment we calculate the center of gravity for its surrounding edge pixels. The different centers of gravity are enough for estimating the orientation of objects that have been segmented into more than one segment. By also calculating the center of gravity for holes of the object and using them in the same way we can estimate the orientation of objects having holes. To recognize the mechanical parts we use the distances between the center of gravity of its segments and holes together with the greatest maximum of the distance map that we find inside each of them. We also calculate the length of the peripheries of the segments and use them to distinguish the objects. We can perhaps recognize, and certainly locate, but not estimate the rotation of the mechanical parts that consist of only one segment without holes. For those objects we construct a circle around the center of gravity with the corresponding greatest maximum as radius. We collect the values of the distance map on this circle line and plot them as a function of the angle to the horizontal axis. We can identify the maxima and minima of this function, from which we estimate the rotation of the object. This information can also be used to identify the object. For overall control algorithm we are using fuzzy logics. As a final step to verify the identification of the mechanical parts and to get a better estimation of the orientation of the objects, we do an edge matching using the distance map which gives us quantitative measurements of how well the edges match. This gives us more accurate estimates than can be achieved by statistical methods.
Identification of flaws in metallic surfaces using specular and diffuse bispectral light sources
Michael Magee, Steven B. Seida, Ernest A. Franke
A computer vision based automated method for identifying and quantifying flaws in cast metal parts is presented. The specific defects to be isolated consist of small circular concavities in the surface (pits) and larger isolated regions (scratches) that may have been abraded due to cutting or handling operations. The approach taken identifies these anomalous features using two spatially separated light sources with different spectral characteristics to produce highly specular illumination at one wavelength and shallow diffuse illumination at a different wavelength. A bispectral image is processed to yield the sought flaws. This processing consists of identifying regions of interest in the original image that may contain potential flaws and applying a morphological region labelling operation to extract candidate pits and scratches. Geometric constraints are applied to the extracted regions in order to isolate the true flaws. The discussion that follows details the algorithmic approach used to identify flaws as well as characterizing the results obtained.
Computation of the medial axis skeleton at multiple complexities
Ronald D. Chaney
The medial axis skeleton is a thin line graph that preserves the topology of a simply connected region. The skeleton has often been cited as a useful representation for shape description, region interpretation, and object recognition. Unfortunately, the computation of the skeleton is extremely sensitive to variations in the bounding contour. Tiny perturbations in the contour often lead to spurious branches of the skeleton. In this paper, we consider a robust method for computing the medial axis skeleton across a variety of scales. The scale-space is parametric with the complexity of the bounding contour. The complexity is defined as the number of extrema of curvature in the contour. A set of curves is computed to represent the bounding contour across a variety of complexity measures. The curves possessing larger complexity measures represent greater detail than curves with smaller measures. A medial axis skeleton is computed directly from each contour. The result is a set of skeletons that represent only the gross structure of the region at coarse scales (low complexity), but represent more of the detail at fine scales (high complexity).
Application of morphological transformations in defect detecting of printed circuit board
ShiFu Yuan, Xueru Zhang, Lixue Chen, et al.
The application of morphological transformations in defect detecting of printed circuit boards (PCB) is discussed. Computer simulation results are given to demonstrate defect detection of PCBs with morphological transformations. A programmable optical binary image morphological processor is given to implement this kind of defect detecting.
Sensory Robotics and Control (Vision, Collision Avoidance, Path Planning)
icon_mobile_dropdown
Mechanical parts inspection for robot grinding flexible manufacturing cells
Karl Ratcliff, Charles R. Allen
Visual inspection tasks often have constraints imposed by the structured lighting conditions required to produce images suitable for analysis by a vision system. This paper describes a method for calculating a robot pathplan for inspection of brass castings based upon a geometrical workpiece model.
Neural network approach for inventory control
Zoheir Ezziane, Abdel Kader Mazouz, Chingping Han
Artificial neural net models have been studied for many years in the hope of achieving human- like performance in different areas. These nets are composed of many nonlinear computational elements operating in parallel exactly as in biological neural nets. Computational elements or node are connected via weights that are dynamically being changed to improve the overall performance. The neural network studied is a two-layer perceptrons (known also as having three layers), each unit of the first layer is connected to a unit in the hidden layer in turn, is connected to every output unit on the output layer. In most applications, the number of input and output units is known and depends upon the nature of the task and application that are considered. In this paper a neural network model is designed for a two-layer feed-forward perceptron. The neural network has a minimum number of hidden neurons, using the backpropagation training algorithm for a non-complex application in a production plant inventory control. Eventually, designing the neural network architecture means seeking a convergence of this latter within a reasonable amount of time. One of the main issues is to determine the number of hidden neurons and what type of data needs to be entered to get the backpropagation algorithm started.
Basic characteristics and realization of production system control
Shaopeng Cheng, Richard Shell, Ernest L. Hall
This paper analyzes the issues involved in developing an intelligent production control system. It describes the basic characteristics of a production control system and an effective design methodology to realize the production control functions. Petri net, subsystem and hierarchical control concepts are applied to a computer integrated material handling system (MHS). Some communication and interface requirements of the MHS are also considered in this paper. The control system solution is illustrated with an actual MHS operation case which indicates that a truly flexible and integrated production system can be realized with a Petri net operation model and a hierarchical control structure. The significance of this work is related to the different operation testing and evaluation requirements encountered in manufacturing.
Communications protocols for a distributed processor to support flexible manufacturing
David W. Elizandro, Scott A. Starks
The proliferation of powerful low cost computer systems has enabled traditional tools of manufacturing to be imbedded in an environment of automated subsystems. Manufacturing technology now includes utilizing computer technology for integrating and managing these subsystems. The Manufacturing Automation Protocol (MAP) provides an infrastructure necessary for linking these computer systems and discrete event simulation is an alternative for integration and management of the manufacturing subsystems.
Laboratory setup for sensory-based robot programming
Evgeni Kukareko, Juha Roening
This paper presents results of developing basic control approaches, structures and software tools for the sensory based robot setup programming. The control task decomposition is described. The paper focuses on the robot controller structure and its implementation. The solution of the inverse and direct kinematic problems in simulation tasks for a GMFanuc S-10 robot are suggested. Preliminary results of simulation experiments are presented.
Collision avoidance tests using the Charlie_1 Trike vehicle
Charles R. Allen, R. West
Autonomous vehicles for advanced robotics applications require accurate world modeling using sensors and built in algorithms to undertake collision avoidance around obstacles. This paper describes the physical attributes of a prototype mobile vehicle using synchronized dc motor drives, and a number of methods for the on-line planning of obstacle avoidance. Measurements of the efficiency of the paths described using each collision avoidance method are quantified using a number of metrics discussed in the paper.
Obstacle detection and terrain characterization using optical flow without 3-D reconstruction
Gin-Shu Young, Tsai Hong Hong, Martin Herman, et al.
For many applications in computer vision, it is important to recover range, 3-D motion, and/or scene geometry from a sequence of images. However, there are many robot behaviors which can be achieved by extracting relevant 2-D information from the imagery and using this information directly, without recovery of such information. In this paper, we focus on two behaviors, obstacle avoidance and terrain navigation. A novel method of these two behaviors has been developed without 3-D reconstruction. This approach is often called purposive active vision. A linear relationship, plotted as a line and called a reference flow line, has been found. The difference between a plotted line and the reference flow line can be used to detect discrete obstacles above or below the reference terrain. For terrain characterization, slopes of surface regions can be calculated directly from optical flow. Some error analysis is also done. The main features of this approach are that (1) discrete obstacles are detected directly from 2-D optical flow, no 3-D reconstruction is performed; (2) terrain slopes are also calculated from 2- D optical flow; (3) knowledge about the terrain model, camera-to-ground coordinate transformation, or vehicle (or camera) motion is not required; (4) the error sources involved are reduced to a minimum, since the only information required is a component of optical flow. An initial experiment using noisy synthetic data is also included to demonstrate the applicability and robustness of the method.
Intelligent robot control using omnidirectional vision
Manoj P. Ghayalod, Ernest L. Hall
Omnidirectional vision using a wide angle lens with a 2 pi steradian field has been studied for image visualization and navigation for mobile robots. The advantages that can be obtained with the large field of view include instantaneous viewing which permits dynamic control and improved visualization. The significant geometric distortion can be corrected using image processing for either image viewing or target recognition. The purpose of this paper is to present results from the use of omnidirectional vision as an integral component in an intelligent robot control system. Using the vision system for position updates provides a method for accurate position control. Simulation and experimental results are presented of a two level control system which uses encoders for dead reckoning position control and periodic vision position updates. The system appears promising for mobile robot navigation and can lead to a robust control system.
Gaze control for an active camera system by modeling human pursuit eye movements
Sebastian Toelg
The ability to stabilize the image of one moving object in the presence of others by active movements of the visual sensor is an essential task for biological systems, as well as for autonomous mobile robots. An algorithm is presented that evaluates the necessary movements from acquired visual data and controls an active camera system (ACS) in a feedback loop. No a priori assumptions about the visual scene and objects are needed. The algorithm is based on functional models of human pursuit eye movements and is to a large extent influenced by structural principles of neural information processing. An intrinsic object definition based on the homogeneity of the optical flow field of relevant objects, i.e., moving mainly fronto- parallel, is used. Velocity and spatial information are processed in separate pathways, resulting in either smooth or saccadic sensor movements. The program generates a dynamic shape model of the moving object and focuses its attention to regions where the object is expected. The system proved to behave in a stable manner under real-time conditions in complex natural environments and manages general object motion. In addition it exhibits several interesting abilities well-known from psychophysics like: catch-up saccades, grouping due to coherent motion, and optokinetic nystagmus.
Controlling multiple groups of robots
MawKae Hor
Coordinating multiple robots has attracted researchers' interests for many years. However, most of the problems being studied deal with multiple robots acted only within a single group. Coordinated robots are categorized into different groups when the coordination involves robots interchange or heterogeneous motion during the manipulation process. In such a case, coordination between robot groups has to be considered. This is required in certain types of coordinated manipulations such as passing an object, held by multiple robots, between groups of robots or rotating or rolling an object, held by multiple robots, continuously. In the former task, coordinations are made between two isotropic groups of robots whereas in the latter task, coordinations are made between non-isotropic groups of robots. This paper investigates problems related to the control and coordinating of multiple groups of robots. We analyze various kind of tasks of these types and propose a hierarchical control mechanism in achieving these coordinations. Scenarios and limitations for these tasks are presented and discussed. A hybrid force and position control principle is employed in both global and local planning and control. A hierarchical architecture is used to control different levels of the control and planning primitives. The primitives developed for controlling individual robot group can be adopted in this architecture. The primitives in one level offer services only to those in its neighboring levels and hides them from the details of actual service implementations hence reducing the system designing complexity.
Industrial robot
A. Prakashan, H. S. Mukunda, S. D. Samuel, et al.
This paper addresses the design and development of a four degree of freedom industrial manipulator, with three liner axes in the positioning mechanism and one rotary axis in the orientation mechanism. The positioning mechanism joints are driven with dc servo motors fitted with incremental shaft encoders. The rotary joint of the orientation mechanism is driven by a stepping motor. The manipulator is controlled by an IBM 386 PC/AT. Microcomputer based interface cards have been developed for independent joint control. PID controllers for dc motors have been designed. Kinematic modeling, dynamic modeling, and path planning have been carried out to generate the control sequence to accomplish a given task with reference to source and destination state constraints. This project has been sponsored by the Department of Science and Technology, Government of India, New Delhi, and has been executed in collaboration with M/s Larsen & Toubro Ltd, Mysore, India.
Visual Servoing in Automated Manufacturing
icon_mobile_dropdown
Real-time adaptive vision servoing of a robotic manipulator
N. Houshangi
To make robotic manipulators function intelligently in an application such as flexible manufacturing, sensory feedback from an unknown environment is needed. Visual feedback represents a typical sensing system in which camera images provide feedback information, for instance, in grasping a moving object. Because image processing is time consuming, information about target position can not be obtained instantaneously for the controller. Because of the inherent time delay, the present and future position of the object has to be predicted in real-time. Since the dynamics of the objects are assumed to be unknown, the prediction will be accomplished by means of an auto-regressive discrete-time model. The predicted values and current end-effector position determine the desired trajectory point (subgoal) for the motion. The planner adapts on-line to changes in the target position. The desired trajectory is tracked by the end-effector controller. After grasping the object, problems may arise in controlling the motion of the manipulator due to the mass of the object attached to the gripper. An adaptive controller is proposed to deal with load uncertainty in the object. A simulation program is presented which demonstrates the task of grasping a moving object by a manipulator using visual feedback.
Selecting viewpoints for visual manufacturing systems
A technique for selecting one camera viewpoint from m viewpoints containing zero mean Gaussian errors is presented. The procedure consists of a two stage analysis. First, the joint entropy of each viewpoint is found. The viewpoint with minimum entropy possesses the greatest possible lower bound reliability of meeting any quadratic specification of the pose error. Hence it is the best pose algorithm to select without further analysis. To guarantee a minimum reliability, a second stage of analysis is necessary. Methods of calculating reliability bounds for a given quadratic specification are explained. The reliability calculations require three orders of magnitude less computations than the alternative, Monte Carlo simulations. On the other hand, reliability analysis requires an order of magnitude more computations than entropy analysis. The concepts are simulated using a visual pose measurement system developed by NASA. The results indicate that entropy is very effective for selecting pose algorithms, and the reliability greatest lower bound is close to the actual reliability.
Kinematic calibration of a binocular head using stereo vision with the complete and parametrically continuous model
Sheng-Wen Shih, Jia-Sheng Jin, Kuo-Hua Wei, et al.
This paper describes the process of calibrating the kinematic model for an active binocular head having four revolute joints and two prismatic joints. We use the complete and parametrically continuous (CPC) model proposed by Zhuang and Roth in 1990 to model the motorized head (or camera positioning system), and use a closed form solution to identify its CPC kinematic parameters. The calibration procedure is divided into two stages. In the first stage, the two cameras are replaced by two end-effector calibration plates each having nine circles. The two removed cameras can be used to build a stereo vision system for observing the varying positions and orientations of the end-effector calibration plates when moving the joints of the head. The positions and orientations of the calibration plates, or equivalently, of the end-effectors, can be determined from the stereo measurements. The acquired data are then used to calibrate the kinematic parameters. In the second stage, the cameras are remounted to the IIS-head, and a method proposed by Tsai is used to calibrate the hand-eye relation. Once the above kinematics calibration is done, the binocular head can be controlled to gaze or track 3-D targets.
Three-dimensional camera-space manipulation using servoable cameras
Umesh A. Korde, Emilio Gonzalez-Galvan, Steven B. Skaar
Using the method of camera-space manipulation, high-precision, 3-dimensional rigid-body positioning tasks have been performed with a holonomic, six-axis, GMF S-400 robot. Further development, aimed at expanding the usable region of the robot's workspace; and at achieving the higher precision enabled by a narrower field of view for the cameras, includes the use of cameras mounted on servoable platforms or `pan/tilt' units. The approach followed in the implementation of servoable cameras is designed to make use of information `learned' before camera repositioning to update view parameter estimates without undergoing large extraneous arm movement. The paper describes this approach and presents the first results of experimental work used for testing it.
Active Vision
icon_mobile_dropdown
Fixation by active accommodation
Kourosh Pahlavan, Tomas Uhlin, Jan-Olof Eklundh
The field of computer vision has long been interested in disparity as the cue for the correspondence between stereo images. The other cue to correspondence, blur, and the fact that vergence is a combination of the two processes, accommodative vergence and disparity vergence, have not been equally appreciated. Following the methodology of active vision that allows the observer to control all his visual parameters, it is quite natural to take advantage of the powerful combination of these two processes. In this article, we try to elucidate such an integration and briefly analyze the cooperation and competition between accommodative vergence and disparity vergence on one hand and disparity and blur stimuli on the other hand. The human fixation mechanism is used as a guide-line and some virtues of this mechanism are used to implement a model for vergence in isolation. Finally, some experimental results are reported.
Fixation-based filtering
Thomas J. Olson, Robert J. Lockwood
Fixation and visual attention are central themes in active vision research, and are closely related. In this paper we discuss one of several ways in which they interact. We describe filtering methods that allow an agent to selectively extract features of the object it is fixating and suppress features of foreground and background objects. The methods are essentially depth filters; they use disparity or motion information to suppress image features that are far from the fixation point in depth. They share a simple computational structure based on the Laplacian pyramid, and are readily amenable to hardware implementation. We present the filters and the properties of fixation geometry that allow them to work, and discuss their behavior. We present methods of implementing them in real time and describe ways of extending them to other features besides depth.
Using viewpoint consistency in active stereo vision
James J. Clark, Michael J. Weisman, Alan L. Yuille
Surface models embedded in Bayesian or regularization style stereo vision algorithms bias the solution in a nonviewpoint invariant way. This lack of invariance reveals itself when the surface is computed from different viewpoints. Using the consistency between views one can try to adapt the prior surface models in a way that renders them more viewpoint invariant. The goal is to be able to adapt the stereo algorithm over time so that the same surface shape is obtained from different views. The method described in this paper uses the surface consistency measure to choose between the solutions provided by a set of simple prior surface models.
Visual looming
Daniel Raviv
The visual looming effect has been shown to be very important when action is taking place. In this paper we take a quantitative approach to visual looming. We define looming mathematically, show geometrical properties of objects that produce the same value of looming, and summarize results on measuring looming including how a multiresolution logarithmic retina simplifies this measurement.
'Bee-bot': using peripheral optical flow to avoid obstacles
David Coombs, Karen Roberts
The bee-bot demonstrates the ability to use low resolution motion vision over large fields of view to steer safely between obstacles. The system uses one receptive field for each of the left and right peripheral visual fields. This is implemented with a camera looking obliquely to each side of the robot. The largest optical flow in a receptive field indicates the proximity of the nearest object. The left and right proximities are easily compared to steer through the gap. Negative feedback control of steering is able to tolerate inaccuracies in this signal estimation. The low cost of such inexpensive basic navigation competence can free additional resources for attending to the environment.
Behavior-based control for an eye-head system
Michael J. Daily, David W. Payton
Much of the recent interest in active vision has focussed on the development of novel methods for controlling fast pan/tilt camera mounts, called eye-head systems. Simple real-time processing of input images coupled with fast control has enabled interesting system behaviors. This paper describes the ongoing development of behavior-based control methods for a miniature eye-head system. We first describe the eye-head hardware and image processing system. We then define and present approaches for behavior-based control of the eye-head system. Finally, we discuss results from the use of simple behaviors for verging two cameras on moving objects.
Autonomous obstacle avoidance using visual fixation and looming
Kunal Joarder, Daniel Raviv
This paper describes a vision-based method for avoiding obstacles using the concepts of visual looming and fixating motion. Visual looming refers to the expansion of images of objects in the retina. Usually, this is due to the decreasing distance between the observer and the object. An increasing looming value signifies an increasing threat of collision with the object. The visual task of avoiding collision can be further simplified by purposive control of visual fixation at the objects in front of the moving camera. Using these two basic concepts real time obstacle avoidance in a tight perception-action loop is implemented. Three-dimensional space in front of the camera is divided into zones with various degrees of looming-based threat of collision. For each obstacle seen by a fixating camera, looming and its time derivative are calculated directly from the 2-D image. Depending on the threat posed by an obstacle, a course change is dictated. This looming based approach is simple, independent of the size of the 3-D object and its range and involves simple quantitative measurements. Results pertinent to a camera on a robot arm navigating between obstacles are presented.
Active stereo vision routines using PRISM-3
Hendrick James Antonisse
This paper describes work in progress on a set of visual routines and supporting capabilities implemented on the PRISM-3 real-time vision system. The routines are used in an outdoor robot retrieval task. The task requires the robot to locate a donor agent -- a Hero2000 -- which holds the object to be retrieved, to navigate to the donor, to accept the object from the donor, and return to its original location. The routines described here will form an integral part of the navigation and wide-area search tasks. Active perception is exploited to locate the donor using real-time stereo ranging directed by a pan/tilt/verge mechanism. A framework for orchestrating visual search has been implemented and is briefly described.
University of Illinois active vision system
A. Lynn Abbott, Narendra Ahuja
This paper describes an active vision system which employs two high-resolution cameras for image acquisition. The system is capable of automatically directing movements of the cameras so that camera positioning and image acquisition are tightly coupled with visual processing. The system was developed as a research tool and is largely based on off-the-shelf components. A central workstation controls imaging parameters, which include five degrees of freedom for camera positioning (tilt, pan, translation, and independent vergence) and six degrees of freedom for the control of two motorized lenses (focus, aperture, and zoom). This paper is primarily concerned with describing the hardware of the system, the imaging model, and the calibration method employed. A brief description of system software is also given.
Electronic front-end processor for active vision
Peter J. Burt, P. Anandan, Keith Hanna
We suggest that a `vision front end' processor, or VFE, will be a key element of practical vision systems that perform complex tasks in real time, such as autonomous vehicle driving. The VFE is a specialized image processing device located between the cameras and the main vision computer. Its functions are to isolate critical image data for subsequent attention-based analysis, and to compute attributes of these data that are required by the system in order to respond quickly to objects or events of interest in the scene. We propose a generic architecture for the VFE, and describe a prototype system that has been built in custom hardware. We show that a compact device can serve multiple vision modalities, including pattern, motion, and stereo vision.
Calibration of the spherical pointing motor
Benjamin B. Bederson, Richard S. Wallace, Eric L. Schwartz
We have built a new miniature pan-tilt actuator, the spherical pointing motor (SPM). The SPM is an absolute positioning device, designed to orient a small camera sensor in two degrees of rotational freedom. The basic idea is to orient a permanent magnet to the magnetic field induced by three orthogonal coils by applying the appropriate ratio of currents to the coils. The function describing the relation between the coil currents and the resultant motor position can be calculated, but it is not very accurate as the actual coils do not exactly satisfy the assumptions made in these calculations. The motor must be calibrated to find the coil currents accurately. This paper describes a procedure for automatic calibration of the SPM. It is based on image feedback from a camera returning space-variant images, mounted on the rotor of the motor. It assumes that a calibrated image sensor and lens are used, i.e., that it is known how many degrees each pixel subtends. It also assumes that the camera rotates about its focal point. The calibration algorithm uses a scene of black dots on a white background. For each motor position that is to be calibrated, the algorithm moves the motor approximately to that position using the calculated currents. The algorithm analyzes the image, and uses the position of the relevant dot to calculate the actual position of the motor. It then associates this position with the coil currents and stores it in a look-up-table. Finally, we interpolate between calibrated points to move to other positions.