Dynamic Stimuli And Active Processing In Human Visual Perception
Author(s):
Ralph Norman Haber
Show Abstract
Theories of visual perception traditionally have considered a static retinal image to be the starting point for processing; and has considered processing both to be passive and a literal translation of that frozen, two dimensional, pictorial image. This paper considers five problem areas in the analysis of human visually guided locomotion, in which the traditional approach is contrasted to newer ones that utilize dynamic definitions of stimulation, and an active perceiver: (1) differentiation between object motion and self motion, and among the various kinds of self motion (e.g., eyes only, head only, whole body, and their combinations); (2) the sources and contents of visual information that guide movement; (3) the acquisition and performance of perceptual motor skills; (4) the nature of spatial representations, percepts, and the perceived layout of space; and (5) and why the retinal image is a poor starting point for perceptual processing. These newer approaches argue that stimuli must be considered as dynamic: humans process the systematic changes in patterned light when objects move and when they themselves move. Furthermore, the processing of visual stimuli must be active and interactive, so that perceivers can construct panoramic and stable percepts from an interaction of stimulus information and expectancies of what is contained in the visual environment. These developments all suggest a very different approach to the computational analyses of object location and identification, and of the visual guidance of locomotion.
Visual Behavior and Intelligent Agents
Author(s):
R. C. Nelson;
D. H. Ballard;
S. D. Whitehead
Show Abstract
Recent robotic models suggest that many complex representational problems in visual perception are actually simplified in systems with behavioral capabilities that permit them to move and interact with the environment. These studies have shown that some complex behaviors can be reduced to a collection of loosely coordinated agents, greatly reducing the complexity of control protocols. These and many other observations emphasize the importance of behavioral context in models of intelligence, and suggest that the natural coordinates for representing information are in terms of behaviors.
A Computational Model for Dynamic Vision
Author(s):
Saied Moezzi;
Terry E. Weymouth
Show Abstract
This paper describes a novel computational model for dynamic vision which promises to be both powerful and robust. Furthermore the paradigm is ideal for an active vision system where camera vergence changes dynamically. Its basis is the retinotopically indexed object-centered encoding of the early visual information. Specifically, we use the relative distances of objects to a set of referents and encode this information in image registered maps. To illustrate the efficacy of the method, we have chosen to apply it to the problem of dynamic stereo vision. Integration of depth information over multiple frames obtained by a moving robot generally requires precise information about the relative camera position from frame to frame. Usually, this information can only be approximated. The method facilitates the integration of depth information without direct use or knowledge of camera motion.
A Bayesian Foundation for Active Stereo Visions
Author(s):
Larry Matthies;
Masatoshi Okutomi
Show Abstract
Sensing three-dimensional shape is a central problem in the development of robot systems for autonomous navigation and manipulation. Stereo vision is an attractive approach to this problem in several applications; however, stereo algorithms still lack reliability and generality. We address these problems by modelling the stereo depth map as a discrete random field, by formulating the matching problem in terms of Bayesian estimation, and by using this framework to develop a "bootstrap" procedure that employs fine camera motion to initialize stereo fusion. First, one camera is translated parallel to the stereo baseline to acquire a narrow-baseline image pair; then, the depth map obtained from the narrow-baseline image pair is used to constrain matching in a "wide-baseline" image pair consisting of one image from each camera. The result of our procedure is an estimate of depth and depth uncertainty at each pixel in the image. This approach produces accurate depth maps reliably and efficiently, applies to indoor and outdoor domains, and extends naturally to multi-sensor systems. We demonstrate the potential of this approach by showing results c lined with scale models of difficult, outdoor scenes.
Model-Based Planning of Sensor Placement and Optical Settings
Author(s):
Roger Y. Tsai;
Kostantinos Tarabanis
Show Abstract
We present a model-based vision system that automatically plans the placement and optical settings of vision sensors in order to meet certain generic task requirements common to most industrial machine vision applications. From the planned viewpoints, features of interest on an object will satisfy particular constraints in the image. In this work, the vision sensor is a CCD camera equipped with a programmable lens (i.e. zoom lens) and the image constraints considered are: visibility, resolution and field of view. The proposed approach uses a geometric model of the object as well a model of the sensor, in order to reason about the task and the environment. The sensor planning system then computes the regions in space as well as the optical settings that satisfy each of the constraints separately. These results are finally combined to generate acceptable viewing locations and optical settings satisfying all constraints simultaneously. Camera planning experiments are described in which a robot-arm positions the camera at a computed location and the planned optical settings are set automatically. The corresponding scenes from the candidate viewpoints are shown demonstrating that the constraints are indeed satisfied. Other constraints, such as depth of focus, as well as other vision sensors can also be considered resulting in a fully integrated sensor planning system.
Multi-Sensor Data Fusion for Estimation of a Moving Polyhedral Object
Author(s):
Ren C. Luo;
Woo Suk Yang
Show Abstract
This paper presents an approach to estimate the general 3D motion of a polyhedral object using multiple sensor data some of which may not provide, sufficient information for the estimation of object motion. Motion can be estimated continuously from each sensor through the analysis of the instantaneuous state of an object. The instantaneuous state of an object is specified by the rotation, which is defined by a rotation axis and rotation angle, and the displacement of the center of rotation. We have introduced a method based on Moore-Penrose pseudoinverse theory to estimate the instantaneuous state of an object, and a linear feedback estimation algorithm to approach the motion estimation. The motion estimated from each sensor is fused-to provide more accurate and reliable information about the motion of an unknown object. The techniques of multi-sensor data fusion can be categorized into three methods: averaging, decision, and guiding. We present a fusion algorithm which combines averaging and decision. With the assumption that the motion is smooth, our approach can handle the data sequences from multiple sensors with different sampling times. We can also predict the next immediate object position and its motion.
New Results in Automatic Focusing and a New Method for Combining Focus and Stereo
Author(s):
Charles V. Stewart;
Hari Nair
Show Abstract
This paper presents an improved technique for obtaining depth from focus and a new method for combining the results of stereo and focus to reduce the number of stereo matching errors. Potential sources of error in applying the focusing algorithm are identified by examining the local gray level variance of the image. Some regions are eliminated from further consideration, while in other areas the size, shape and orientation of the windows used in estimating focus quality are selected based on analysis of the local image content. Using this approach, depth values from focus may be obtained in more regions of the image than are obtained using earlier methods. The results of focus may then be incorporated directly into a stereo matching algorithm called LMA. Essentially, focus is used to add support to candidate stereo matches that are consistent with its results. This simple combination technique is shown to resolve the ambiguities of periodic regions in matching and to filter out incorrect matches, especially when a wide range of depth values is possible.
Robust Estimation of Image Flow
Author(s):
Brian G. Schunck
Show Abstract
Motion is a key visual cue for segmentation. A particularly difficult problem is the segmentation of objects that are camouflaged and match the background perfectly. In such situations, it is critical that the boundaries of the object be determined accurately since the outline (silhouette) will be the only visual information available for segmentation and recognition. Image flow is the velocity field in the image plane caused by the motion of the observer, objects in the scene, or apparent motion. The image flow velocity field can be used for segmentation if the motion boundaries can be accurately estimated. When an object moves against a background, the motion constraints are distorted along the boundary and the motion estimation problem is very difficult to solve. The distorted motion data would be called outliers by workers in the field of robust statistics. Recent work has been done on robust algorithms for image flow estimation and segmentation. New algorithms based on robust regression show promise in handling the difficult problems associated with discontinuities in the image flow velocity field. The results point toward better methods for handling discontinuities in other vision problems. In particular, algorithms for sensor fusion will face the task of combining information that contains outliers and discontinuities. The material presented in this paper may indicate directions for future research on sensor fusion.
An Approach to Fuse Correlation-Based and Gradient-Based Methods for Image-Flow Estimation
Author(s):
Ajit Singh
Show Abstract
Visual motion is commonly extracted from an image sequence in the form of an image-flow field or an image-displacement field. In the past research on estimation of motion-fields, three basic approaches have been suggested: correlation-based approach, gradient-based approach and spatiotemporal energy based approach. Since the underlying measurements used by the three approaches are different, they have different error characteristics. This scenario is representative of the classic multi-sensor problem. Algorithms based on the three basic approaches can be thought of as three different sensors measuring a given quantity, i.e., image-flow, with different error characteristics. The measurements from different sensors can be combined to produce an estimate of image-flow that is optimal, i.e., it minimizes the estimation-error (in a statistical sense). In other words, the three basic approaches can be fused to give an estimate of image-flow that has a higher confidence as compared to the estimate obtained from any one approach alone. We suggest information-fusion as a framework to estimate image-flow. In this framework, multiple sources give their opinion about image-flow in the form an estimate along with a confidence measure. These estimates are then fused on the basis of the corresponding confidence measures to get a robust estimate. We show an implementation of this framework that fuses correlation-based and gradient-based approaches.
Recovery of 3-D Motion and Structure by Temporal Fusion
Author(s):
Tarek M. Sobh;
Kwangyoen Wohn
Show Abstract
We discuss the problem of recovering the 3-D motion and structure. An algorithm for computing the camera motion and the orientation of planar surface is developed. It solves for the 3-D motion and structure iteratively given two successive image frames. We further improve the solution by solving the ordinary differential equations which describe the evolution of motion and structure over time. The robustness of the entire process is demonstrated by the experiment with a moving camera which "flies" over a terrain model.
Object Tracking From Image Sequences Using Stereo Camera And Range Radar
Author(s):
Stelios C. A. Thomopoulos;
Lars Nillson
Show Abstract
The problem of estimating the position of and tracking an object undergoing 3-D translational and rotational motion using passive and active sensors is considered. The passive sensor used in this study is a stereo camera, whereas the active is a range radar. Three different estimation approaches are considered. The first involves estimation of the object position by direct registration of stereo images. In the second approach, the Extended Kalman Filter is used for estimation with measurements the stereo images. In the third approach, an integral filter based on stereo images and range radar measurements is used for tracking. The three different approaches are compared via simulation in the tracking of an object undergoing a 3-D motion with random translational and angular acceleration.
Robot-Based Real-Time Motion Tracker
Author(s):
Matthew S. Clark
Show Abstract
This paper describes a motion-tracking system that couples a vision system and a robot arm manipulator. The main issue is how to combine typical existing laboratory-type equipment while resolving problems of asynchronous communication between devices. A pyramid-based motion-detection scheme is used to find the distance-independent region of motion within a camera scene. This information is relayed to the the robot arm, which physically moves the camera and orients it towards the region of detected motion. A static look-and-move method is implemented, and a timing algorithm is designed to coordinate the real-time vision and robot systems.
Sensor Integration And Data Fusion
Author(s):
Stelios C. A. Thomopoulos
Show Abstract
The problem of sensor integration and data fusion is addressed. We consider the problem of combining information from diversified sources in a coherent fashion. We assume that at the fusion, the information from various sensors may be available in different forms. For example, data from infrared (IR) sensors may be combined with range radar (RR) data, and further combined with visual images. In each case, the data and information from the different sensors are presented in a different format which may not be directly compatible for all sensors. Furthermore, the available information may be in the form of attributes and not dynamical measurements. A theory for sensor integration and data fusion that accommodates diversified sources of information is presented. Data (or, more generically, information) fusion may proceed at different levels, like the level of dynamics, the level of attributes, and the level of evidence. All different levels are considered and several practical examples of real world data fusion problems are discussed.
Robust Multi-Sensor Fusion: A Decision-Theoretic Approach
Author(s):
Gerda Kamberova;
Max Mintz
Show Abstract
Many tasks in active perception require that we be able to combine different information from a variety of sensors which relate to one or more features of the environment. Prior to combining these data, we must test our observations for consistency. The purpose of this paper is to examine sensor fusion problems for linear location data models using statistical decision theory (SDT). The contribution of this paper is the application of SDT to obtain: (i) a robust test of the hypothesis that data from different sensors are consistent; and (ii) a robust procedure for combining the data which pass this preliminary consistency test. Here, robustness refers to the statistical effectiveness of the decision rules when the probability distributions of the observation noise and the a priori position information associated with the individual sensors are uncertain. The standard linear location data model refers to observations of the form: Z = θ + V, where V represents additive sensor noise and 0 denotes the "sensed" parameter of interest to the observer. While the theory addressed in this paper applies to many uncertainty classes, the primary focus of this paper is on asymmetric and/or multimodal model, which allow one to account for very general deviations from nominal sampling distributions. This paper extends earlier results in SDT and multi-sensor fusion obtained by Zeytinoglu and Mintz (1984, 1988), and McKendall and Mintz (1988).
Evidential Integration In A Multi-Sensor System With Dependency Information
Author(s):
Youssef A. Bazzi;
Brian T. Mitchell
Show Abstract
Several methods for combining uncertain evidences in a multi-sensor system have been developed in the literature. Most of these methods make routine assumptions on the type of relation between these evidences, that is the evidences are independent. This assumption leads to serious questions as to the validity of the conclusions reached by systems which employ such schemes. This paper presents a technique for reasoning with dependency information in expert systems that acquire probabilistic evidence. Three types of depndency relations have been considered. The proposed technique is suitable for computerization, and results in a more certain hypothesis value when applied to distributed sensor system and to terminal reliability in computer communication networks.
A Comparison of Information Gathering Approaches
Author(s):
Greg Hager
Show Abstract
We define information gathering as the task of both deciding what sensor data to gather and process, and fusing those observations into a common representation. In most such systems there is a tradeoff between the complexity of a computational technique and its efficiency at gathering and using information. In some cases, it turns out that computationally simple technique, though it may make poor use of information, outperforms a, more accurate but, more complex technique simply because of its ability to process more observations in less time. Naturally, the correct tradeoff between speed and complexity varies from situation to situation. In this paper, we discuss and compare the grid-based techniques developed in our previous work, classic minimum mean-square estimation techniques (MMSE), and a, modification of MMSE which is robust, to nonlinear systems and system description errors with respect to their efficiency at using sensor information. We then discuss the ability of these techniques to improve their performance by choosing favorable sensor observations, and what effect this has on their overall complexity/efficiency tradeoff.
Feature Selection and Decision Space Mapping for Sensor Fusion
Author(s):
Cynthia L. Beer;
Gerald M. Flachs;
David R. Scott;
Jay B. Jordan
Show Abstract
An information fusion approach is presented for mapping a multiple dimensional feature space into a lower dimensional decision space with simplified decision boundaries. A new statistic, called the tie statistic, is used to perform the mapping by measuring differences in probability density functions of features. These features are then evaluated based on the separation of the decision classes using a parametric beta representation for the tie statistic. The feature evaluation and fusion methods are applied to perform texture recognition.
Sensor Fusion: A New Approach And Its Applications
Author(s):
M. A. Abidi
Show Abstract
In this paper, we describe an uncertainty and data fusion approach that we have developed and tested. This new fusion algorithm is based on the interaction between two constraints: (1) the principle of knowledge source corroboration, which tends to maximize the final belief in a given proposition (often modeled by a probability density function or fuzzy membership distribution), if either of the knowledge sources supports the occurrence of this proposition and (2) the principle of belief enhancement/withdrawal which adjusts the belief of one knowledge source according to the belief of the second knowledge source by maximizing the similarity between the two source outputs. These two principles are combined by maximizing a positive linear combination of these two constraints related by a fusion function, to be determined. The latter maximization is achieved and the fusion function is uniquely determined using the Euler-Lagrange equations in Calculus of Variations. This method has been tested using various features from synthetic and real data of various types and of many dimensionalities resulting in fused data which satisfies both principles mentioned above. Through these experiments, we have demonstrated the synergism that results through the combination of information available from various sensors. The implementation of this method was performed on both sequential and parallel machines.
Spatial Reasoning: Learning from Observations
Author(s):
Mirsad Hadzikadic;
Su-shing Chen
Show Abstract
This paper is concerned with a machine learning approach to spatial reasoning, under the goal of developing an intelligent system capable of sensor fusion, scene understanding, reasoning with spatial objects, and improving its performance over time. We integrate an incremental conceptual clustering system (INC) with a com-puter vision system, developed in our Artificial Intelligence and Computer Vision laboratories. The project consists of: (1) developing spatial knowledge models and cognitive functions within a spatial learning system, and (2) evaluating the plausibility, effectiveness, and constraints of the system in both the general-type spatial reasoning tasks in cognitive systems and the following scene analysis/understanding tasks: sensor fusion, object recognition, model instantiation, image/scene description, and reasoning about a scene.
Performance Prediction For Multi-Sensor Tracking Systems: Kinematic Accuracy And Data Association Performance
Author(s):
Ted J. Broida
Show Abstract
This paper addresses some of the performance issues encountered in the use of multiple sensors for surveillance and tracking in aerospace and defense applications. These problems generally involve detecting the presence of an unknown number of objects of interest (often referred to as "targets"), and estimating their position and motion from periodic measurements (tracking). Measurements zk = z(tk) are received from one or more sensors at times tk and are classified (labelled) as arising either from one of the objects currently being tracked, from a new or previously undetected object, or as a false alarm or clutter. Typically, measurement models involve nonlinear functions of true object kinematics in additive zero-mean white noise, z(tk) = h[x(tk)] n(tk). Estimates of object kinematics (e.g. position, velocity, acceleration) x are formed from each labelled measurement sequence {zk}, with the objective of keeping an accurate and complete awareness of the external environment. In addition to additive measurement noise, a number of uncertainties are present: (1) objects can maneuver between measurements ("random" acceleration), (2) some measurements (threshold crossings) are due to noise alone (false alarms), or due to non-zero-mean interference, with unknown spatial and temporal covariance ("clutter"), and can be misclassified as being from an object of interest, (3) some measurements actually from an object of interest can be misclassified as being from a different object, or as being noise or clutter, (4) object detection is not guaranteed, so that a sensor can "observe" a region containing an object but fail to detect it (PD < 1), whether or not it is being tracked, and (5) there are errors in knowledge of the relative position and attitude of different sensors, particularly if sensors are moving independently (different platforms). The functions of data association (labelling measurements from different sensors, at different times, that corre-spond to the same object or feature) and data fusion (combining measurements from different times and/or different sensors) are required in one form or another in essentially all multiple sensor fusion applications: one function determines what information should be fused, the other function performs the fusion. This paper presents approaches for quantifying the performance of these functions in the surveillance and tracking application. First, analytical techniques are presented that bound or approximate the fused kinematic estimation performance of multiple sen-sor tracking systems, in the absence of association errors. These bounds and approximations are based on several extensions of standard Kalman filter covariance analysis procedures, and allow modeling of a wide range of sensor types and arbitrary, time-varying geometries, both sensor-to-sensor and sensor-to-object. Arbitrarily many sensors can be used with varying update intervals, measurement accuracies, and detection performance. In heavy clutter or false alarm backgrounds it is often impossible to determine which (if any) of the measurements near a target track actually arise from the target, which leads to a degradation of tracking accuracy. This degradation can be estimated (but not bounded) with an approximate covariance analysis of the Probabilistic Data Association Filter (PDAF). Next, data association performance is quantified in terms of error probability for the case of closely spaced objects (CSOs) with minimal clutter, and for the case of isolated objects in a heavy clutter or false alarm background. These probabilities can be applied to data acquired by any sensor, based on measurement and track accuracies described by error covariance matrices. For example, in many applications a track established by one sensor is used to cue another sensor - in the presence of CSOs and/or clutter backgrounds, this approach can be used to estimate the probability of successful acquisition of the desired target by the second sensor.
Constructing Octrees from Multiple Views: A Parallel Implementation
Author(s):
C. H. Chien
Show Abstract
Inferring 3D information from 2D images has been one of the major concerns of researchers in computer vision community. A single view of an object usually conveys insufficient 3D information about the object. The integration of these partial information from multiple views, however, very often yields fairly good approximation of the 3D structure of the observed object. In the past, we have developed efficient (sequential) algorithms for constructing the octrees of 3D objects from their multiple views. In this paper, we present the parallel implementation of these sequential algorithms using PARADIGM, a mechanism for composing spatial-oriented distributed programs. Alternative strategies for each phase of octree generation are discussed in the context of data partitioning and task allocation. We have implemented our proposed alogirthms on Nectar, a fiber-optic based network computer currently being developed at Carnegie Mellon. It is expected that the results and experiences obtained from this research work will be beneficial to future research into real-time multi-sensor fusion.
Design And Implementation Of A Multi-Sensor Fusion Algorithm On A Hypercube Computer Architecture
Author(s):
Charles W. Glover
Show Abstract
A multi-sensor integration (MSI) algorithm written for sequential single processor computer architecture has been transformed into a concurrent algorithm and implemented in parallel on a multi-processor hypercube computer architecture. This paper will present the philosophy and methodologies used in the decomposition of the sequential MSI algorithm, and its transformation into a parallel MSI algorithm. The parallel MSI algorithm was implemented on a NCUBETM hypercube computer. The performance of the parallel MSI algorithm has been measured and compared against its sequential counterpart by running test case scenarios through a simulation program. The simulation program allows the user to define the trajectories of all players in the scenario, and to pick the sensor suites of the players and their operating characteristics. For example, an air-to-air engagement scenario was used as one of the test cases. In this scenario, two friend aircrafts were being attacked by six foe aircraft in a pincer maneuver. Both the friend and foe aircrafts launch missiles at several different time points in the engagement. The sensor suites on each aircraft are dual mode RADAR, dual mode IRST, and ESM sensors. The modes of the sensors are switched as needed throughout the scenario. The RADAR sensor is used only intermittently, thus most of the MSI information is obtained from passive sensing. The maneuvers in this scenario caused aircraft and missile to constantly fly in and out of sensors field-of-view (F0V). This resulted in the MSI algorithm to constantly reacquire, initiate, and delete new tracks as it tracked all objects in the scenario. The objective was to determine performance of the parallel MSI algorithm in such a complex environment, and to determine how many multi-processors (nodes) of the hypercube could be effectively used by an aircraft in such an environment. For the scenario just discussed, a 4-node hypercube was found to be the optimal size and a factor two in speedup was obtained. This paper will also discuss the design of a completely parallel MSI algorithm.
Interpreting Segmented Laser Radar Images Using a Knowledge-Based System
Author(s):
Chen-Chau Chu;
N. Nandhakumar;
J. K. Aggarwal
Show Abstract
This paper presents a knowledge-based system (KBS) for man-made object recognition and image interpretation using laser radar (ladar) images. The objective is to recognize military vehicles in rural scenes. The knowledge-based system is constructed using KEE rules and Lisp functions, and uses results from pre-processing modules for image segmentation and integration of segmentation maps. Low-level attributes of segments are computed and converted to KEE format as part of the data bases. The interpretation modules detect man-made objects from the background using low-level attributes. Segments are grouped into objects and then man-made objects and background segments are classified into pre-defined categories (tanks, ground, etc.) A concurrent server program is used to enhance the performance of the KBS by serving numerical and graphics-oriented tasks for the interpretation modules. Experimental results using real ladar data are presented.
A Mixture Neural Net For Multispectral Imaging Spectrometer Processing
Author(s):
David Casasent;
Timothy Slagle
Show Abstract
Each spatial region viewed by an imaging spectrometer contains various elements in a mixture. The elements present and the amount of each are to be determined. A neural net solution is considered. Initial optical neural net hardware is described. The first simulations on the component requirements of a neural net are considered. The pseudoinverse solution is shown to not suffice, i.e. a neural net solution is required.
Information Fusion Methods For Coupled Eddy-Current Sensors
Author(s):
Kristina H. Hedengren;
Kenneth W. Mitchell;
David E. Ritscher
Show Abstract
Though imaging techniques have been applied routinely for visual, x-ray, and ultrasound inspections, they are new to eddy-current testing. Over the past few years, such techniques have been developed and applied to eddy-current data to demonstrate their value for defect detection and system analysis. In eddy-current testing, changes are sensed in the impedance of a coil excited by an alternating current to detect surface defects in metal. In eddy-current imaging, a complex impedance is measured at each pixel and used to construct an image pair. Eddy-current measurements can also be made using multiple coils; this paper discusses complex impedance image pairs formed using coupled coils that provide information from the surface covered by the two coils. A phasor plot, which is a scatter diagram of the complex impedance image pair, is an effective tool for presenting, explaining, and analyzing the information. In these plots a defect and the background noise often map to different loci, and thus have distinct signatures. The phasor plot can be used to find an appropriate combination of the image pair to improve defect detection. This paper discusses the presentation and fusion of the two images for enhanced defect detection. Images are shown together with their phasor plots to demonstrate the effects of rotational transformations and different signal combinations. The described methods are generic and can also be applied to image pairs created by different modality imaging systems; this is demonstrated with a combined eddy-current and ultrasound pair of images.
Unifying Voice And Hand Indication Of Spatial Layout
Author(s):
Tomoichi Takahashi;
Akira Hakata;
Noriyuki Shima;
Yukio Kobayashi
Show Abstract
A method of unifying voice and hand pointing information to indicate an object on a map is proposed. Our approach is to represent two different kinds of information in a unified form and merge their information. Voice indications are transformed into a set of terms of an object's attributes and relationships of objects with associated values which show their ambiguity. A hand pointing gesture is also transformed into a term which represents that one of a number or objects pointed at takes priority. IMAGE can identify an object indicated with voice and hand pointing by selecting the object which primarily satisfies the combined effectiveness of the relationships and terms.
Probabilistic Foundations For Information Fusion With Applications To Combining Stereo And Contour
Author(s):
David Shulman;
John (Yiannis) Aloimonos
Show Abstract
Many general frameworks exist for fusion of information from several sources. Among them are random fields, Dempster-Shafer, fuzzy sets. They all can be considered as computationally convenient approximations to a true probabilistic analysis of the errors in constraints relating data and unknowns. In fact all problems of combination of evidence can be given a common formulation in terms of regularization theory. Such a theory can even be extended to allow for discontinuities in the unknowns. At the most abstract level, the information fusion process is simply reconciling a priori constraints on the unknowns (constraints of smoothness that really do not depend on the particular cues being used with different data constraints. So it is crucial to find convenient, reliable constraints, ideally one data constraint relating several data cues. We show how this is possible for the case of stereo and planar contour and some of the problems involved in extending to the non-planar case. The non-planar case is difficult but at least there is a way (provided by contour) to lessen the amount of search stereo demands.
Cooperative Integration Of Vision And Touch
Author(s):
Peter K. Allen
Show Abstract
Vision and touch have proved to be powerful sensing modalities in humans. In order to build robots capable of complex behavior, analogues of human vision and taction need to be created. In addition, strategies for intelligent use of these sensors in tasks such as object recognition need to be developed. Two overriding principles that dictate a good strategy for cooperative use of these sensors are the following: 1) sensors should complement each other in the kind and quality of data they report, and 2) each sensor system be used in the most robust manner possible. We demonstrate this with a contour following algorithm that recovers the shape of surfaces of revolution from sparse tactile sensor data. The absolute location in depth of an object can be found more accurately through touch than vision; but the global properties of where to actively explore with the hand are better found through vision.
An Optimal Illumination Method For Surface Reconstruction
Author(s):
Michael Hatzitheodorou;
John R. Kender
Show Abstract
We solve the shape from narrow light beam projections problem, a new approach to surface reconstruction. In this approach a surface is recovered from sparse depth data obtained from the projection of a narrow light beam on this surface. This projection will yield a small light spot whose location can be easily measured in the camera image. From the (x, y)-position of this spot and from the location of the light source we can obtain the surface depth at the point (x, y). Furthermore, a method for the placement of the light sources is proposed, and it is shown that the proposed positioning is optimal or, in other words, it will result in the smallest error among all other possible placements of the lights. The spline algorithm that will recover the surface from the obtained data is constructed and it is shown to yield the smallest possible error among all algorithms that can be used to solve the problem. The spline algorithm is linear, and can be constructed easily and with a low cost.
Trinocular Stereo: Theoretical Advantages and a New Algorithm
Author(s):
Charles V. Stewart
Show Abstract
This paper presents a new three-camera stereo matching algorithm, called the Trinocular Local Matching Algorithm (TLMA), that helps to overcome some of the inherent problems in binocular matching. These problems include (1) the inability of binocular algorithms to obtain matches for horizontal edge segments, (2) secondary errors that arise in image regions containing unmatched horizontal segments, (3) the inability of binocular algorithms to obtain matches in occluded regions, (4) incorrect matches that may be accepted in occluded regions, and (5) the matching ambiguity of periodic image texture. TLMA matches images taken from cameras positioned on the vertices of an isosceles right triangle. Edges detected in the base image are matched with either the right or top image. For each candidate match a support value is computed using three support measures: the disparity gradient, the trinocular disparity gradient and cross-channel consistency multiresolution. Support values are compared for competing matches to determine the confidence in each match. High confidence matches that pass a final consistency check, called the area rule, are accepted as correct. TLMA is shown to avoid some of the design errors of previous trinocular algorithms. Preliminary experimental results on both real and synthetic images comparing TLMA with a similarly defined binocular algorithm demonstrate TLMA's effectiveness in producing improved matching results.
On Using Color In Edge-Based Stereo Algorithms
Author(s):
John R. Jordan III;
Alan C. Bovik
Show Abstract
One approach to developing faster, more accurate stereo algorithms is to seek a more complete and efficient use of information available in stereo images. The use of chromatic (color) information has been largely neglected in this regard. Motivations for using chromatic information are discussed, including strong evidence for the use of chromatic information in the human stereo correspondence process. To investigate the potential role of chromatic information in edge-based stereo algorithms, a novel chromatic matching constraint -- the chromatic gradient matching constraint -- is presented. A thorough analysis of the utility of this constraint in both the "match extraction" and "disparity selection" stages of the PMF Algorithm 1 is performed for a wide range of matching strength "support neighborhood" sizes. The performances of the algorithm with and without these constraints are directly compared in terms of disambiguation ability, matching accuracy and algorithm speed. The results demonstrate that the use of chromatic information can greatly reduce matching ambiguity, resulting in increased matching accuracy and algorithm speed.
Fusion Of Vision And Touch For Spatio-Temporal Reasoning In Learning Manipulation Tasks
Author(s):
Jan M. Zytkow;
Peter W. Pachowicz
Show Abstract
This paper presents a framework for the fusion of vision and touch, useful in learning various manipulation tasks by a robot arm. Initially the robot has poor knowledge of the laws that govern the behavior of objects, and incomplete knowledge about physical features of individual objects. We analyse the fusion of vision and touch for learning object manipulation tasks, various methods of features acquisition, and an architecture of the system that provides feedback between sensing, manipulating and learning. Simple control loops allow the system to execute the manipulation tasks and to learn such a selection of the values of control parameters that prevents faults and object damage. The main emphasis is on learning. In sections 5 and 6 we demonstrate how the system discovers new regularities, how it recognizes new and useful object properties, and how the performance on similar tasks can be improved by application of newly acquired knowledge. Sections 1-4 describe a preliminary design of an architecture that allows for application of sensor fusion and for learning by improving manipulation skills by a robot arm.
Multiresolutional Sensor Fusion By Conductivity Analysis
Author(s):
H. E. Stephanou;
A. M. Erkmen
Show Abstract
This paper describes an evidential pattern classifier for the combination of data from physically different sensors. We assume that the sensory evidence is multiresolutional, incomplete, imprecise, and possibly inconsistent. Our focus is on two types of sensory information patterns: visual and tactile. We develop a logical sensing scheme by using a model based representation of prototypical 3D surfaces. Each surface represents a class of topological patterns described by shape and curvature features. The sensory evidence is classified by using a conductivity measure to determine which prototypical surface best matches the evidence. A formal evidential model of uncertainty is used to derive logical sensors and provide performance measures for sensor integration algorithms.
Efficiency in the Generation of Hierarchical Feature Detectors in Neural Nets
Author(s):
Oleg G. Jakubowicz
Show Abstract
Biologically in the primary visual area in the brain there is a full set of elementary feature detectors at every location in the retinal field. These detectors are akin to the two dimensional edge and bar detectors commonly used in computer vision. We present in this paper biological simulation details of how these particular detectors most probably might be neurobiologically constructed and organized into a topologically ordered output plane. Then we point out the pitfall of over-representation of information that can occur in naive self-organizing neural network models for vision and present. how our properly constructed network overcomes this problem. This paper is intended to give some productive guidelines for constructing self-organizing networks whose cells have locally receptive fields.
Real-Time Detection Of Multi-Colored Objects
Author(s):
Lambert E. Wixson;
Dana H. Ballard
Show Abstract
Fast object recognition is critical for robots in the real world. However, geometry-based object recognition methods calculate the pose of the object as part of the recognition process and hence are inherently slow. As a result, they are not suitable for tasks such as searching for an object in a room. If pose calculation is eliminated from the process and a scheme is used that simply detects the likely presence of the object in a scene, considerable efficiency can be gained. This paper contains a discussion of the requirements of any searching task and presents a fast method for detecting the presence of known multi-colored objects in a scene. The method is based on the assumption that the color histogram of an image can contain object "signatures" which are invariant over a wide range of scenes and object poses. The resulting algorithm has been easily implemented in off-the-shelf hardware and used to build a robot system which can sweep its gaze over a room searching for an object.
Combining Multiple Forms Of Visual Information To Specify Contact Relations In Spatial Layout
Author(s):
H. A. Sedgwick
Show Abstract
An expert system, called Layout2, has been described, which models a subset of available visual information for spatial layout. The system is used to examine detailed interactions between multiple, partially redundant forms of information in an environment-centered geometrical model of an environment obeying certain rather general constraints. This paper discusses the extension of Layout2 to include generalized contact relations between surfaces. In an environment-centered model, the representation of viewer-centered distance is replaced by the representation of environmental location. This location information is propagated through the representation of the environment by a network of contact relations between contiguous surfaces. Perspective information interacts with other forms of information to specify these contact relations. The experimental study of human perception of contact relations in extended spatial layouts is also discussed. Differences between human results and Layout2 results reveal limitations in the human ability to register available information; they also point to the existence of certain forms of information not yet formalized in Layout2.
A Spatial and Temporal Frequency Based Figure-Ground Processor
Author(s):
Namoi Weisstein;
Eva Wong
Show Abstract
Recent findings in visual psychophysics have shown that figure-ground perception can be specified by the spatial and temporal response characteristics of the visual system. Higher spatial frequency regions of the visual field are perceived as figure and lower spatial frequency regions are perceived as background/ (Klymenko and Weisstein, 1986, Wong and Weisstein, 1989). Higher temporal frequency regions are seen as background and lower temporal frequency regions are seen as figure (Wong and Weisstein, 1987, Klymenko, Weisstein, Topolski, and Hsieh, 1988). Thus, high spatial and low temporal frequencies appear to be associated with figure and low spatial and high temporal frequencies appear to be associated with background.
The Role of Perception in a Theory of Communication
Author(s):
Leora Morgenstern
Show Abstract
Agents who operate in complex environments generally obtain necessary information through their perceptions. Most commonly, these perceptions are part of the communicative process. In this paper, we present an integrated theory of perception and communication, based on the thesis that communicative acts are best understood as actions of perception. We show that an agent learns through communicative acts by combining the information that a communicative act has just occurred together with his prior knowledge. We distinguish between primary and secondary meanings in communicative actions, and give a detailed formalization of the code model to explain primary communicative actions.
A Fuzzy Representation For Event Occurrence
Author(s):
Kenneth J. Overton;
Dale E. Gaucas
Show Abstract
Activity recognition systems and their supporting temporal representations typically model the occurrence of an event using numeric values or numeric ranges for start times, finish times, and durations. Such an approach may not adequately reflect the uncertainty and variation experienced in real situations. This paper in-vestigates a more general representation for event duration and start time using fuzzy sets. From this representation, confidence functions are defined in support of reasoning about a sequence of event occurrences. The potential application of this approach to the problem of focus of attention in sensor allocation is discussed.
Task Directed Sensing
Author(s):
R. James Firby
Show Abstract
High level robot control research must confront the limitations imposed by real sensors if robots are to be controlled effectively in the real world. In particular, sensor limitations make it impossible to maintain a complete, detailed world model of the situation surrounding the robot. To address the problems involved in planning with the resulting incomplete and uncertain world models, traditional robot control architectures must be altered significantly. Task directed sensing and control is suggested as a way of coping with world model limitations by focusing sensing and analysis resources on only those parts of the world relevant to the robot's active goals. The RAP adaptive execution system [9] is used as an example of a control architecture designed to deploy sensing resources in this way to accomplish both action and knowledge goals.
Multiresolution Constraint Modeling For Mobile Robot Planning
Author(s):
Anthony Stentz
Show Abstract
All navigation systems for a mobile robot include a basic sense-plan-drive cycle for moving the robot about. The navigator must choose sensing points, plan paths between them, and oversee the execution of the robot's trajectory. Various constraints must be met in selecting potential sensing points, feasible trajectories, and safe configurations, all while taking positional unceitainty into account. We have developed a local planner that finds trajectories for a robot by modeling all of the constraints uniformly. At each step in the planning process, the system identifies and uses the most severe constraint to guide the search. A multi-resolution approximation to the constraint solution space is employed to reduce the number of search states, and thus the planning time. The system has been implemented and tested on real data. The results are presented.
Fusion without Representation
Author(s):
Monnett Hanvey Soldo
Show Abstract
The topic of this conference is how various sensors can be used together to support robot mobility and other related tasks. The support that is needed - what you want to use the sensors to create - is an understanding of the layout of the environment, the nature of its (other) mobile elements, etc., so that the robot can at least navigate and avoid collisions. This "understanding" may take the form of an explicit, symbolic representation (model) whose symbols can be manipulated by a planner and eventually used to influence/direct robot motion. We demonstrate, however, that it is possible and sometimes desirable to bypass this representation phase, allowing the sensors to directly influence robot behavior (i.e., allowing the "understanding" to be a procedural one). And we show that one can achieve, using this approach, effective robot motion. We present results obtained on a real robot that procedurally integrates odometry, sonar, and vision - fusing not only different sensors but also data from the same sensors over time - in real-time navigation and exploration.
Space-Time Modeling Using Environmental Constraints in a Mo-bile Robot System
Author(s):
Marc G. Slack
Show Abstract
Grid-based models of a robot's local environment have been used by many researchers building mobile robot control systems. The attraction of grid-based models is their clear parallel between the internal model and the external world. However, the discrete nature of such representations does not match well with the continuous nature of actions and usually serves to limit the abilities of the robot. This work describes a spatial modeling system that extracts information from a grid-based representation to form a symbolic representation of the robot's local environment. The approach makes a separation between the representation provided by the sensing system and the representation used by the action system. Separation allows asynchronous operation between sensing and action in a mobile robot, as well as the generation of a more continuous representation upon which to base actions.
Synthesizing Information-Update Functions Using Off-Line Symbolic Processing
Author(s):
Stanley J. Rosenschein
Show Abstract
This paper explores the synthesis of programs that track dynamic conditions in their environment. We propose an approach in which the designer specifies, in a declarative language, aspects of the environment in which the program will be embedded. This specification is then automatically compiled into a program that, when executed, updates internal data structures so as to maintain as an invariant a desired correspondence between internal data structures and states of the external environment. This approach retains much of the flexibility of declarative programming while guaranteeing a hard bound on the execution time of information-update functions.
Transitioning Mechanized Plan Recognition From Closed To Real-World Domains
Author(s):
Douglas Walter J. Chubb
Show Abstract
David Chapman3 proved that planning which includes an action representation whose effects are a function of their input situation is undecidable. One of the paradoxes central to any formal theory of planning is how humans manage to accurately recognize on-going plans in real-world domains. Chapman has suggested that humans make use of cognitive cliches3. This paper introduces an axiomatic based mathematical theory of planning and plan recognition first described by the author4. The author argues that a real-world mechanized plan recognition paradigm should not include logical completeness as a criteria. Rather, the author proves that the human plan observer is knowledge-poor but cognitively complete. Plan action choice within cognitively complete domains is shown to critically depend upon ill-defined notions of set equality. The author describes a potential implementation of these ideas as a type of reactive planner, similar to Mase's and Kaelbing's recent research efforts8,7.
Sensor Fusion: Storage, Search And Problem Solving Efficiency Issues Associated With Reasoning In Context
Author(s):
Richard Antony
Show Abstract
This paper proposes an abstract model of the data fusion process that is based on a generalization of the sensor correlation paradigm. The fusion model demonstrates that sensor fusion is a proper subset of data fusion and reveals two fundamental approaches to individual fusion processes based on explicit and implicit representations of knowledge. Most attempts to automate data fusion have been based on various forms of explicit representation, while implicit representations are more characterisitic of knowledge representations employed by human problem solvers. The efficiency of the two approaches is contrasted for two specific fusion problems: doctrinal templating and path planning. While an explicit representation approach may require the exhaustive representation of a large number of explicit templates, an implicit representation approach ( which supports reasoning in context) may require the maintenance, search and manipulation of large domain knowledge bases. A previously proposed database organization is recommended that supports the fusion process for both representation classes by facilitating highly efficient mixed semantic and spatial-oriented queries and manipulation.
The Fusion of Voice and Video
Author(s):
William J. Wolfe;
Gita Alaghband;
Donald W. Mathis
Show Abstract
We investigate the simultaneous occurrence of speech, vision and natural language. Several applications are analyzed in order to demonstrate and categorize the many ways that voice signals, images and text can be semantically related. Examples are provided of how connectionist, blackboard, and conceptual dependency approaches apply.
Spatial Database Organization for Multi-attribute Sensor Data Representation
Author(s):
Feliz Ribeiro Gouveia;
Jean-Paul A. Barthes
Show Abstract
This paper surveys spatial database organization and modelling as it is becoming a crucial issue for an ever increasing number of geometric data manipulation systems. We are here interested in efficient representation and storage structures for rapid processing of large sets of geometric data, as required by robotics applications, Very Large Scale Integration (VLSI) layout design, cartography, Computer Aided Design (CAD), or geographic information systems (GIS), where frequent operations involve spatial reasoning over that data. Existing database systems lack expressiveness to store some kinds of information which are inherently present in a geometric reasoning process, such as metric information, e.g. proximity, parallelism; or topological information, e.g. inclusion, intersection, contiguity, crossing. Geometric databases (GDB) alleviate this problem by providing an explicit representation for the spatial layout of the world in terms of empty and occupied space, together with a complete description of each object in it. Access to the data is done in an associative manner, that is, by specifying values over some usually small (sub)set of attributes, e.g. the coordinates of physical space. Manipulating data in GDB systems involves often spatially localized operations, i.e., locations, and consequently objects, which are accessed in the present are likely to be accessed again in a near future; this locality of reference which Hegron [24] calls temporal coherence, is due mainly to real world physical constraints. Indeed if accesses are caused for example by a sensor module which inspects its surroundings, then it is reasonable to suppose that successive scanned territories are not very far apart.
Integration of Data-Fusion Techniques for Autonomous Vehicle Driving
Author(s):
Daniele D. Giusto;
Stefano Pozzi;
Carlo S. Regazzoni;
Gianni Vernazza;
Riccardo Zelatore
Show Abstract
An autonomous vehicle must have the capability of interpreting data provided by multiple sensors in order to face various environmental conditions. To this end, different physical sensors (i.e, RGB or IR camera, laser range finder, etc.) which can provide information of the image type can be used. Moreover, virtual sensors (i.e., processes which simulate new sensors by transforming in different ways original images) can be obtained by Computer Vision techniques. In this paper, we present a knowledge-based data fusion system with a distributed control, which integrates data both at physical and at virtual sensors level, by pursuing segmentation and interpretation goals. Outdoor road scenes, with and without obstacles are considered as an applicative test set.