Integral imaging for 3D object detection
Detecting objects in a 3D scene is an interesting challenge and can be useful for various computer-vision applications. For instance, detection of objects in 3D space has been proposed as a way to clearly present a scene in prosthetic vision devices.1 It is the intention that such devices will be used by blind people to aid their basic vision. Integral imaging techniques—with which confocal images of depth planes are computationally reconstructed—are potential inexpensive sources of rich 3D data.
During the integral imaging computational reconstruction process, multi-view images are used. These are recorded at slightly different angles and are known as elemental images (EIs).2, 3 Within each reconstructed plane (RP) image, the objects that are located at the depth of the plane appear sharp. Objects at other depths, however, appear blurred (at a level that is related to the distance from the reconstructed depth). Existing integral imaging-based studies can only be used to determine object depth location by matching the RPs to an EI.4–6 These methods usually require prior information on the object's location in the EI and cannot be used for 3D object detection and isolation.
We recently developed a new method for 3D localization of objects using computationally reconstructed integral imaging.1, 7 In our method for 3D object detection, we find the most focused regions in the reconstructed planes,7 and the occurrences of objects at the depth axis.1 We refer to these extracted features as ‘edges.’ We use edge features in the object detection process as they are informative and efficient descriptors of the objects. Their efficiency originates from the small number of edge pixels that we use in the object detection process, relative to the whole intensity image.
There are two main phases in our 3D object localization process. First, we find the depth locations of the objects.1 Second, for each depth location, we produce the object's features in its transverse plane.7 To find the object's depth location, we compare the EI edge detection results with the results of the edge detection operation when it is applied to different RPs at various distances along the depth axis.1 We assume an object that exists at a specific depth plane appears similarly sharp in the RP and the EI. The edges of the object therefore match in both images.
For our 3D object detection process we implement and compare three different edge detection techniques. We developed the first of these techniques with the aim of detecting the most focused regions in RP images.7 To do this, we use a Haar wavelet transform at the first scale and we then apply a threshold to the wavelet coefficients at each band. This leaves only 0.5% of edge pixels—those with the highest gradient values—in each band. In the second technique, a first-order gradient (approximated by the Sobel operator8), and a threshold that is equal to half of the absolute average gradient, are used. In the third method, we use a second-order gradient that is approximated by the Laplacian of Gaussian (LoG) operator.8 In this case, the threshold of the zero crossings is equal to half of the absolute average of the second-order image derivative and a Gaussian with a standard deviation of two. An EI of an example scene is shown in Figure 1(a). The results of the three edge detection techniques for this EI are shown in Figure 1(b–d).
The estimations from all three edge detection techniques—of the depth locations of the objects in Figure 1(a)—are plotted in Figure 2 according to the number of edge pixels that overlap between the RP and the EI, as a function of the depth location. Each local peak in these graphs represents convergence to an object's depth location. These results indicate that all three techniques yield similar results in this first phase of our localization process. Objects were found in two main depth planes, i.e., located about 0.6 and 2.2m from the camera. In the second phase of our process, we find the sharp object's surface at each of the depth planes that were identified in the first phase. In this process, we remove the blurred regions of the objects from the other depths. This second phase generally involves high-pass filtering and the application of adaptive thresholds to the RP images at the appropriate depths. We therefore apply the same edge detection techniques as in the first phase. The image reconstructed at 0.6m from the camera is shown in Figure 3(a). The associated detections of the sharp edge regions are shown in Figure 3(b–d). The results for the second detected depth plane (at about 2.2m from the camera) are presented in Figure 3(e–h). For this phase, we find that the results from the three edge detection techniques are different. The LoG-based technique, which represents a second-order derivative, produces a noisier result than the other techniques (which are variations of the first-order derivative approximation). Moreover, the wavelet-based technique yields a richer result compared with the Sobel-based method.
We have developed a method for the localization of objects in a 3D space using information obtained by computational integral imaging. With our approach, we first find the locations of objects in the depth axis and then produce their edge surface features in the transverse 2D plane. We are currently developing further improvements for our technique. We are focusing on several different aspects, which include the efficiency of the method, an improved performance for the object isolation, and an implementation in noisy conditions.
Ben Gurion University of the Negev
Yitzhak Yitzhaky received his PhD from Ben Gurion University, Israel, in 2000. He was subsequently a postdoctoral fellow at the Schepens Eye Research Institute of Harvard Medical School in Boston. His main research interests are image processing and vision.
Doron Aloni is currently a PhD student. He received his BSc in mechanical engineering in 2009 and his MSc in electro-optical engineering in 2011, both from Ben Gurion University, Israel. His research deals with 3D image processing under various conditions and applications.