Proceedings Volume 1260

Sensing and Reconstruction of Three-Dimensional Objects and Scenes

Bernd Girod
cover
Proceedings Volume 1260

Sensing and Reconstruction of Three-Dimensional Objects and Scenes

Bernd Girod
View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 1 January 1990
Contents: 6 Sessions, 25 Papers, 0 Presentations
Conference: Electronic Imaging: Advanced Devices and Systems 1990
Volume Number: 1260

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Depth From Defocus and Stereo
  • Structure from Motion
  • Photometry and Active Sensing
  • Contour Based Algorithms
  • Integration and Automatic Model Building
  • Modeling for Visual Communications
Depth From Defocus and Stereo
icon_mobile_dropdown
Three-dimensional image capture by volume imaging
George J. M. Aitken, Peter F. Jones
Three-dimensional incoherently illuminated scenes can be captured electronically by measuring the intensity distribution in a volume behind the lens. The 3-D image is reconstructed by an inversion algorithm employing the measured, space-variant, 3-D point spread function of the lens. A computer simulation using a measured PSF illustrates the effectiveness of the technique.
Optical rangefinding from image focus
A method for gauging the distance from a video camera to an object of interest is described. By using a calibrated camera-lens system, range was related to focus of a selected object. Optimum focus of the image was determined by maximizing the high-frequency content of the Fourier transform of the object image. The Walsh- Hadamard transform was investigated as an alternative focusing function. Software was developed to determine optimum image focus and control a motorized camera lens. Range values from the video camera to target objects were calculated by the system. Calculated values were compared with measured distances. For any given distance, the difference between calculated and actual distance averaged less than 1.2%. Distance values calculated using the Walsh-Hadamard transform differed from values calculated with the Fourier transform by less than 1%.
3-D motion tracking using stereo camera and range radar
Stelios C.A. Thomopoulos, Lars Nillson
The problem of estimating the position of and tracking an object undergoing 3-D translational and rotational motion using passive and active sensors is considered. The passive sensor used in this study is a stereo camera, whereas the active is a range radar. Three different estimation approaches are considered. The first involves estimation of the object position by direct registration of stereo images. In the second approach, the Extended Kalman Filter is used for estimation with measurements the stereo images. In the third approach, an integral filter based on stereo images and range radar measurements is used for tracking. The three different approaches are compared via simulation in the tracking of an object undergoing a 3-D motion with random translational and angular accelaration.
Maximum a-posteriori probability 3-D surface reconstruction using multiple intensity images directly
Yi-Ping Hung, David B. Cooper
Reconsiructing 3D surfaces using multiple intensity-images is an important problem in computer vision. Most approaches for this problem require either finding the 2D features correspondence or estimating the optical flow first. The probabilistic model-based approach shown in this paper utilizes the intensity data directly (i.e., no feature extraction) for reconstructing 3D surfaces without first solving the correspondence problem or estimating optical flow explicitly. We model 3D objects as surface patches where each patch is described as a function known up to the values of a few parameters. Surface reconstruction is then treated as the problem of parameter estimation based on two or more images taken by a moving camera. By constructing the likelihood function for the surface parameters and modeling prior knowledge about 3D surfaces with a Markov random field, we are able to compute the maximum posterior probability 3D surface reconstruction based on the observed images. This paper presents some experimental results based on a sequence of intensity images taken by a moving camera. Our approach has the advantages of: (i) directly estimating shape for surface patches, thus making object recognition simpler; (ii) formally incorporating prior knowledge about 3D surfaces; (iii) being highly parallel in required computation, hence promising for real-time operation; (iv) producing optimal accuracy in a probabilistic sense; (v) being algorithmically simple; (vi) being robust with real data.
Structure from Motion
icon_mobile_dropdown
Time-sequential structure and motion estimation without optical flow
Joachim Heel, Shahriar Negahdaripour
We present a method for recovering structure and motion from a sequence of images which does not require computation of the optical flow. We build on the previously developed "Direct Motion Vision" approach for estimation of structure and motion from two frames. In this formulation structure and motion are obtained directly from the gradient of image brightness which is computed orders of magnitude faster than optical flow. The direct methodology is applied to estimate motion and normal of a planar surface relative to a camera. Using a dynamical model of the camera motion we then show how measurements from an arbitarily long sequence of images can be integrated with the help of an observer/filter to improve the estimate over time. Experimental results on real images are presented.
Toward shape from image motion
Niels da Vitoria Lobo
Much research effort has been expended on attempting to calculate a description of the structure of an arbitrary environment from the information present in the image motion obtained by an observer in this environment. Most techniques use the image velocity from across somewhat large regions of the image to compute the parameters of 3-dimensional motion and the shape parameters of the viewed objects. The calculations generally require many image velocity measurements as input, a requirement that is often impractical. Thus, it is interesting to examine what can be achieved even when only a few such measurements are used. In this paper, we first show that given image velocity measurements at two positions in the image, it is possible to cancel out the rotation component of motion. Then, we use this to show that measurements from three positions, when combined with advance knowledge of just the direction of the motion's translation component, yield information about the shape of the environment. This result holds for arbitrary instantaneous motion between an observer and surfaces in the environment, and also permits qualitative judgements that could be used as end results in themselves if richer descriptions of shape were unnecessary.
Motion estimation from points without correspondences from scaled orthographic projections
This paper presents a new method for estimating motion parameters of a set of 3-D points without correspondences between points at two time instants. All point coordinates are given in projections. Three scaled orthographic projections taken before and after the motion are used to determine the scaled translation vector and to recover the 3-D scatter matrix. Once the 3-D scatter matrix is determined at two time instants, we can find the rotation matrix using eigenvector decomposition method. The proposed method requires neither correspondences in projections nor correspondences between two time instants.
Direct-motion stereo
Brian Y. Hayashi, Shahriar Negahdaripour
In this paper, we show how the translational motion of a stereo vision system relative to, and its distance from, the scene can be recovered in closed form directly from the measurements of image gradients and time derivatives. There is no need to estimate image motion or establish correspondences between features across images. The direction of translational motion is recovered using a procedure which involves minimizing the sum squared error of a linear constraint equation over the image. The solution is given in terms of the eigenvector corresponding to the smallest eigenvalue of a 3 x 3 positive semi-definite matrix. Using the average disparity, which maximizes the crosscorrelation between the left and right images, we estimate the scale-factor necessary to compute the magnitude of the translational motion, and consequently the distance to the scene.
Terrain shape estimation from optical flow, using Kalman filtering
William A. Hoff, Cheryl W. Sklair
As one moves through a static environment, the visual world as projected on the retina seems to flow past. This apparent motion, called optical flow, can be an important source of depth perception for autonomous robots. An important application is in planetary exploration -the landing vehicle must find a safe landing site in rugged terrain, and an autonomous rover must be able to navigate safely through this terrain. In this paper, we describe a solution to this problem. Image edge points are tracked between frames of a motion sequence, and the range to the points is calculated from the displacement of the edge points and the known motion of the camera. Kalman filtering is used to incrementally improve the range estimates to those points, and provide an estimate of the uncertainty in each range. Errors in camera motion and image point measurement can also be modelled with Kalman filtering. A surface is then interpolated to these points, providing a complete map from which hazards such as steeply sloping areas can be detected. Using the method of extended Kalman filtering, our approach allows arbitrary camera motion. Preliminary results of an implementation are presented, and show that the resulting range accuracy is on the order of 1-2% of the range.
Photometry and Active Sensing
icon_mobile_dropdown
Non-Lambertian shading and photometric stereo
Hemant D. Tagare, Rui J. P. de Figueiredo
Classical shape-from-shading and photometric stereo theones assume that diffuse reflection from real-world surfaces is Lambertian. However, there is considerable evidence that diffuse reflection from a large dass of surfaces is nonLambertian' . Using a Lambertian model to reconstruct such surfaces can cause serious errors in the reconstruction. In this paper, we propose a theory of non-Lambertian shading and photometric stereo. First, we explore the physics of scattering and obtain a realistic model for the reflectance map of non-Lambertian surfaces. The reflectance map is significantly non-linear. We then explore the number of light sources and the conditions on their placement for a globally unique inversion of the photometric stereo equation for this reflectance map. We theoretically establish the minimum number oflight sources needed to achieve this. These results are then extended in several directions. The main part of the extension is the joint estimation of surface normal along with the surface albedo. In the literature, this problem has been addressed only for Lambertian surfaces. We establish some basic results on the problem ofjoint estimation using the manifold structure of intensities obtained from photometric stereo. We will show that the joint estimation problem is ill-posed and propose a regularization scheme for it. Our experiments show that using the techniques proposed here, the fidelity of reconstruction can be increased by an order of magnitude over existing techniques.
Estimation of surface spectral reflectance of inhomogeneous objects
This paper describes a method for estimating the surface spectral reflectance function of inhomogeneous objects. The standard reflectance model for inhomogeneous materials suggests that surface reflectance functions can be described as the sum of a constant (specular) function and a subsurface ( diffuse) function. First we present an algorithm to generate an illuminant estimate without using a reference white standard. Next we show that several physical constraints on the reflectance functions can be used to estimate the subsurface component. A band of the estimated spectral reflectance functions is recovered as possible solutions for the subsurface component.
Laser range-finding techniques in the sensing of 3-D objects
Ilkka P.A. Kaisto, Juha Tapio Kostamovaara, Ilkka Moring, et al.
Three pulsed time-of--flight laser rangefinders have been developed for studying the measurement of the 3-D shape of large objects. A manually scanned system is suggested for manufacturing accuracy measurement and control for ship block assembly. This system can be used to measure distance, plane regularity, angles, spatial forms etc . , within a range from 3 m to 30 m with mm-level accuracy . The two others are automatically scanning based on galvanometer driven mirrors, and a servo-controlled mechanical scanner. These systems are intended for applications where it is important to be able to gather 3-D data automatically and with high speed. The resolutions are also on the mm-level, but the measurement speed is 10 000 points/s at maximum.
Three-dimensional vision system
Johnny L. Berg
The Holometrics 3-D vision system is an active laser ranging sensor capable of high frarne-rate mapping of the 3-D metric properties of a scene contained within the sensor field-of--view (FOV). The high resolution ranging function is accomplished with well known laser ranging techniques. The range data acquired by the system is manipulated by an image processor utilizing proprietary software. The algorithmic operations extract 3-D shape data on scene objects and are insensitive to object rotation, orientation or partial obscuration. The sensor acquisition of the 3-D data is independent of ambient lighting conditions or visual contrast between object and background. The elapsed time from scene scan to processor output is less than three seconds. The system output data is used to support a wide range of automated manufacturing, inspection, robotic, navigation, bin-picking, assembly, ordnance guidance, etc. applications. A block diagram of the Holometrics 3-D Vision System is provided in Figure 1.
Contour Based Algorithms
icon_mobile_dropdown
Analytic Hough transform
David Cyganski, William F. Noel, John A. Orr
An analytic extension of the Hough Transform is introduced and analyzed, and an implementation is demonstrated. The Hough Transform in its usual implementation has proven to be a useful tool for image segmentation and feature extraction through identification of approximately coffinear point sets in images. The Analytic Hough Transform (AliT) algorithm significantly improves upon these results by operating specifically with the information in spatially quantized images to yield those pixel sets that exactly define digital lines in the image. The resulting pixel sets, while being subsets of a digital line set, need not be contiguous. Thus the AHT also represents an alternative to digital line tests that depend upon contiguity. An Inverse Analytic Hough Transform (IAHT) is also introduced. For a given quantized image the AliT segments its Hough parameter space into convex polygons that represent all real line sets that pass entirely through certain digital line pixel sets in the image. The IAHT converts these parameter space polygons into a pair of convex hulls in image space. A real line passes between these hulls if and only if it passes through every pixel connected with the parameter space polygon. Thus the IAHT generates a pair of simple geometric boundaries in image space that associate pixels with polygonal AliT solution regions. An implementation of the AliT is discussed and demonstrated. It is found that the AliT, with its exact results, can be a computationally attractive alternative to the usual implementation of a high resolution Hough Transform. Furthermore, the AliT and the IAHT effectively couple and efficiently find exact solutions to the problems of digital line detection and determination of associated real line parameters.
Extracting feature points and feature lines from triangular surface models
In order to recognize an arbitrary 3D object, it is often required to extract feature points and feature lines from its surface model. The feature points and feature lines include peaks, pits, ridge lines, and valley lines. In this paper, we present an efficient technique for finding the features from the triangular surface model of an arbitrary 3D object. Given a set of surface data points, we find, using the local adjustment technique, the triangular patches that best fit the surface of the object. For the resulting triangle-based surface model, unit normal vectors and side lengths of the triangular patches are used systematically to locate the feature points and lines of the surface. We present experimental results on simple objects with feature points and feature lines.
Implementation aspects and performance of 3-D object reconstruction from silhouettes
David Cyganski, John A. Orr, David J. Cubanski, et al.
This paper presents the development and implementation of a method for reconstruction of threedimensional object information from silhouettes. Previous work has demonstrated the possibility of such reconstruction based on the differential equations relating surface terminator curves and their projections, but has not addressed important aspects of the implementation given spatially quantized images and a finite number of silhouettes. The method presented here is exact in that it makes appropriate use of angularly and spatially quantized silhouette information to form convex bounds for non-convex objects. For a given set of quantized silhouettes inner and outer convex hulls are obtained by means of an efficient algorithm. The true object convex hull must lie between these two hulls which represent the tightest hulls that can be constructed with the given information. Results of reconstruction by the algorithm are shown, using actual camera-acquired silhouette data. A detailed analysis of the sources of error is presented, demonstrating the effects of spatial quantization of the original silhouettes and of the angular separation of successive silhouettes. It is shown that for a given spatial resolution and local object curvature, an optimum angular separation between pairs of silhouette views exists, and that reconstruction error increases with either a larger or small angular separation. The convex hull boundary construction used in this work is shown to always use the best pair of silhouette points for each hull vertex.
Qualitative 3-D models
Willie Y. Lim
Rocks can be effectively used as landmarks for robot navigation through rocky terrains. For the robot to do this, it has to be able to automatically build models of rocks. For the rocks world, models containing qualitatively described surfaces are used. A rock is modeled as a graph. The surface of the rock is decomposed into surface patches separated by crude edges. Each surface patch is represented by a node in the graph. The arcs represent the adjacency relationships between the surface patches. To build such a model, the following approach is taken. For a scene composing of a single approximately convex object, easily distinguishable from its background, a silhouette of the object is obtained. The silhouette is partitioned into crude segments according to the general shape of the segments. Each segment is typed as either concave, convex or straight. The classification is done by measuring the mean and standard deviation of distances of points of the segment from the straight line joining its ends. The qualitative model for the rock is built by initially assuming that the silhouette is a cross-sectional view of the rock. A simple cyclic graph composing of nodes with surface types consistent with the segment types is built. Thus a five segment silhoutte composing of three convex, a straight and a concave segment results in a graph with 5 nodes, three of which are convex surfaces, one flat (corresponding to the straight silhouette segment) and the other concave. The model is improved by moving the camera to a different position and obtaining another silhouette. From the positions of the camera and the segment types, either new nodes are created or the surface types of the currently existing nodes are modified. A method for automatically building such models is discussed.
Surface reconstruction based on descriptions of cross-sectional contours
Yong H. Kim, Ren C. Luo
Surface reconstruction from cross-sectional contours has become increasingly important in medical image applications. In this paper, a method of reconstructing the surface of an object based on the descriptions of cross-sectional contours has been developed. Each cross-sectional contour is first partitioned into convex/concave segments based on the relative locations of points on the contour. Each segment of the contour is then described by a parametric cubic polynomial, and the boundary is recovered based on the description. A matching technique incorporating possible deformations between adjacent cross-sections is used to obtain the correspondence between adjacent cross-sectional contours. Once the correspondence is established, the surface between correspondent segments of adjacent crosssectional contours is reconstructed by a traditional triangulation technique. As the reconstruction is based on the shape of the cross-sections, the reconstructed surfaces are more close to that of the original object.
Integration and Automatic Model Building
icon_mobile_dropdown
3-D reconstruction using deformable models
Demetri Terzopoulos
This paper reviews a physically-based approach to the reconstruction of 3D visual data over space, time, and scale using deformable models. It summarizes three applications that exemplify the approach-visual surface reconstruction, stereo correspondence matching, and the recovery of 3D shape and nonrigid motion.
Part description and segmentation using contour, surface, and volumetric primitives
Alok Gupta, Ruzena K. Bajcsy
In this paper we discuss the ongoing research on the problem of shape description, and decomposition of complex objects in range images. We propose a paradigm for part description and segmentation by integration of contour, surface, and volumetric primitives. Unlike previous approaches, we use geometric properties derived from both bpundary-based (surface contours and occluding contours), and primitive-based (biquadratc patches and superquadric models) representations to define and recover part-whole relationships, without a priori knowledge about the objects or the object domain. The descriptions thus obtained are independent of position, orientation, scale, domain and domain properties, and are based purely on geometric considerations. We pose the problem of integration in terms of evaluation of the intermediate descriptions and segmentation of the objects in a closed loop process. We present algorithms for superquadric edge detection and apparent contour generation. The criteria for the evaluation of the superquadric models is discussed and examples of real objects supporting our approach are presented.
Canonical fitting of deformable part models
Alexander P. Pentland
I describe a system that fits deformable models to range data. Models are represented by using modal dynamics applied to volumetric primitives, which significantly improves the computational complexity of both model recovery and subsequent processing. Given a segmentation of the range data into parts (see reference [16]), a volumetric description is obtained by a fitting procedure that minimizes squared error between the range measurements and the model's visible surface. For simple part shapes it is possible to compute the deformable model's parameters using only the shape of its symmetry axes.
Modeling for Visual Communications
icon_mobile_dropdown
Automatic modeling of 3-D moving objects from a TV image sequence
Claus E. Liedtke, Hans Busch, Reinhard Koch
Luminance changes in a TV image sequence can be interpreted as being due to a scene consisting of 3D-objects moving with 6 degrees of freedom in 3D-space. The 3D-space is illuminated by a light source and is observed by a camera. A parametric model is presented which employs an explicit representation of the illumination source, the camera, and the 3D-objects. A method is suggested, in which an Analysis-by-Synthesis-process automatically extracts the model parameters from an incoming sequence of monocular TV images and thereby allows the modelling of the 3D scene. Since a parametric description of a complex scene resulting in a physically satisfactory modelling cannot be achieved in all generality, we have concentrated on the modelling of quasi- rigid, natural objects with homogeneous surfaces like the head-shoulder parts of human beings.
Modeling of 3-D moving objects for an analysis-synthesis coder
For object—oriented analysis—synthesis coding an image analysis algorithm is required which automatically generates the parameter sets which describe moving 3D objects in an image sequence. Areas changed between two consecutive images are detected by means of change detecti()n. Special processing is carried out to compute areas which coincide with object boundaries and to eliminate areas which represent illumination changes. These areas are segmented into silhouettes of moving objects and uncovered background. The border of an ob ject silhouette is interpreted as the outermost contour of an object. These contours in combination with a simple function giving the z—distance between them provide a first estimiate for 3D shape of a model object. In order to improve the efficiency of motion analysis a concept for combining model objects with new parts of moving objects is proposed. Results of an automatic image analysis based on moving 3D objects are shown using video telephone test sequences.
Segmentation and motion estimation in image sequences
Norbert Diehl
This contribution presents a method to segmentate video scenes hierachically into different moving objects and subobjects using a 2 -dimensiona1 description of these scenes. Therefore information from single images as well as information from successive images is used to spit up a scene into different objects. Furthermore each of these objects is characterized by a transform h(x, T) which is implicitely describing the surface and the three-dimensional motion of the moving objects in the scene. Using this description an object oriented prediction of the image contents from one image to the next as it may be used in low bitrate image coding is possible.
Real-time head motion detection system
Kenji Mase, Yasuhiko Watanabe, Yasuhito Suenaga
We present a three-dimensional head motion detection system called a realtime headreader. This headreader analyzes the head motion picture sequences taken by a TV-camera, and extracts the motion parameters in realtime, i.e. 3-d rotations and translations. We used a simple but very fast algorithm, which exploits the contrast of hair and face to recognize face orientation. The system extracts the head and face area, then estimates the head motion parameters from the change in position of each area's centroids. The head motion is computed at nearly 10 frames per second on a SUN4 workstation and the motion parameters are sent to an IRIS workstation at a 2.5 Kbps. The IRIS generates a head motion sequence that duplicates the original head motion. The entire motion detection program is written in C language. No special image processing hardware is used, except for a video digitizer. Our head motion detection system will enhance man-machine interactions by providing a new visual eue. An operator will be able to point to a target by just looking at it thus a mouse or 3-d tracking device is not needed. The eventual goal of this research is to build an intelligent video communication system that codes the information in terms of high level language rather than compressed video signals.