Automated object recognition is critical for applications such as robotics, target recognition, and surveillance. Performing recognition by surface shape is increasingly attractive as accurate 3D imaging technologies advance. In our work, we address the problem of efficiently detecting and recognizing objects from range images, particularly with large sets of known objects (including families of shapes as well as fixed, rigid shapes).
Object recognition using cameras is challenging due to the mixing of surface-shape, reflectance, and illumination information. Therefore, range images (which directly represent surface shape) are frequently used. Nevertheless, it is not trivial to locate objects in a range image without performing a large search among objects and their poses. For example, the ‘iterative closest points’ approach1 efficiently ‘homes in’ on a match between a single model and part of the range data if the match is already close. Many methods have been devised to obtain match hypotheses, often by measuring local shape features and using these to look up possible matching objects. However, most still require considerable search efforts.
Our approach aims for efficient ways to measure general surface-shape features that are invariant under viewpoint changes. Tripod operators (TOs: see Figure 1) extract small sets of N points from 3D surface data in a canonical way such that we can generate and compare coordinate-independent shape descriptions efficiently. Using TOs, a specific surface shape generates a signature that is a manifold of dimension d ≤ 3 in a feature space of dimension d = N − 3. Runtime application-to-surface data generates a d vector, whose distance from the manifold is related to the match likelihood.
Figure 1. (top) Particular type of nine-dimensional (12-point) tripod operator (TO) using ‘hinged’ equilateral triangles. (bottom) The invariant signature manifold resulting from applying the TO densely to a 90° dihedral edge shape. This inherently 2D manifold (umbrella shape) is shown in 2D projection within the first 3 TO-feature components. Generalizing to all planar dihedral angles would generate a 3D manifold.
The TO is essentially a general-purpose surface-shape feature detector that can be used in many ways in a vision system. The simplest is to densely apply TOs to a surface model during training, store the resulting feature vectors, and set up an indexing system so that the nearest feature vector to a given new counterpart can be found. For low-noise range images, this method2 works surprisingly well. It can reliably locate a known shape in a cluttered scene in milliseconds, and a single TO on the correct object is often sufficient for identification (see Figure 2).
Figure 2. Rapid detection and recognition of a mortar shell in low-noise range data by a single TO placement (green). Red TOs indicate ‘not the object.’ Typical time is a few milliseconds on a PC. TOs are placed randomly until detection.
For large numbers of known objects, noisy range images, families of shapes, and other boundary conditions, we have explored statistical treatments of the TO-feature and pose space distributions, analytic approximation of the underlying manifolds, and other techniques. Statistical treatment of pose space has led to successful research projects including automotive-parts picking (National Institute of Standards and Technology/Flexible Robotic Assembly for Powertrain Applications: see Figure 3) and spacecraft docking (Defense Advanced Research Projects Agency/Spacecraft for the Universal Modification of Orbits).
Figure 3. (top) TOs detect and locate Ford automotive parts. Green shows confident detection and pose estimation, (bottom) enabling the robot to accurately grasp and move the part.
We currently focus on using manifold-learning methods to enable recognition of a very large number of known objects through TO-signature compression and to handle generalizations of known shapes. We have used piecewise, linear manifold-learning techniques3 for pose estimation and are now pursuing other methods4,5 for achieving higher compression ratios and accuracy. This would enable rapid detection and identification of a large library of known munitions types, including members of continuous families of shapes not previously encountered (such as artillery shells that are similar in shape but metrically different). Applications that we plan to pursue are detection and identification of ordnance from a large, known database and fast visual searches for multiple objects from an airborne or ground-based robot.
Navy Center for Applied Research in Artificial Intelligence
US Naval Research Laboratory
Frank Pipitone obtained his PhD in electrical engineering from Rutgers University in 1982. His research includes development of novel range-imaging sensors and recognition and localization of objects using range images.