SPIE Startup Challenge 2015 Founding Partner - JENOPTIK Get updates from SPIE Newsroom
  • Newsroom Home
  • Astronomy
  • Biomedical Optics & Medical Imaging
  • Defense & Security
  • Electronic Imaging & Signal Processing
  • Illumination & Displays
  • Lasers & Sources
  • Micro/Nano Lithography
  • Nanotechnology
  • Optical Design & Engineering
  • Optoelectronics & Communications
  • Remote Sensing
  • Sensing & Measurement
  • Solar & Alternative Energy
  • Sign up for Newsroom E-Alerts
  • Information for:
    Advertisers
SPIE Photonics West 2017 | Register Today

SPIE Defense + Commercial Sensing 2017 | Register Today

2017 SPIE Optics + Photonics | Call for Papers

Get Down (loaded) - SPIE Journals OPEN ACCESS

SPIE PRESS




Print PageEmail PageView PDF

Electronic Imaging & Signal Processing

Detecting regions of interest in images

A hierarchical detection algorithm quickly finds and isolates critical regions in low-quality imagery.
31 November 2006, SPIE Newsroom. DOI: 10.1117/2.1200610.0414

Many intelligence, surveillance, and reconnaissance applications need to detect potential targets or regions of interest (ROIs) in digital imagery. These ROIs can then be used to automatically call for further sensing or other action, to control intelligent image-compression algorithms, or to direct further analysis for target identification and recognition. For example, satellite imagery could be used to cue a flight by an unmanned air vehicle for more-detailed surveillance. Another possible application could send an image containing only the region of interest to a warfighter equipped with a low-bandwidth link, over which transmission of the entire image would not be feasible. For any of these processes, we first need to analyze the image data at the source.

Psychological studies have revealed that humans detect regions of interest before doing any processing. Privitera and Stark1 observed that during active viewing, there are approximately three eye fixations per second, each occurring between saccades (rapid eye jumps during which vision is suppressed). They showed that most people look at the same parts of a given picture, so that certain positions were repeated with high probability. The researchers also showed that computer algorithms could be crafted to find ROIs similar to those determined by people.

ROI detection has been studied for many years. Most algorithms use either feature-based or object-based approaches. Feature-based methods find pixels that share significant optical features with the target and aggregate them to form ROIs.1,2 These methods can capture most of the target pixels on the basis of optical feature similarity. However, not all target pixels have strong optical features, so the detected ROI usually fails to encompass the entire target. In addition, feature-based methods cannot distinguish between targets, which can cause confusion in subsequent stages of processing.

Object-based methods, on the other hand, detect ROIs at a higher level than the pixel-by-pixel approach of feature-based systems using information such as target shape and structure.3,4 Typical approaches include template matching and matched filters. Although these methods can assign a single ROI to one target, they are limited because they require many calculations, have difficulty detecting multiple target types, and are not reliable when applied to low-quality images.

We address the shortcomings of these two approaches with a hierarchical region-of-interest detection (HROID) algorithm that employs multiresolution processing and takes advantage of both feature- and object-based methods.5,6 Our technique uses optical features to detect the main construct of each target, and prior information to include relevant pixels that would be missed by the feature-based method (see Figure 1). To reduce computational complexity and improve performance, we work at the lowest resolution that can still distinguish targets. Morphological processing ensures that a single ROI corresponds to a single target, which facilitates subsequent processing. Finally, a voting procedure allows effective processing of multiple shapes.


Figure 1. Combining images and prior knowledge of targets yields regions-of-interest within the image.
 

The HROID method includes five main processing steps (see Figure 2). First, we use prior information to divide the targets into groups and to determine the lowest detection resolution for each group. We then downsample the image to the required resolution using wavelet transformation. This is followed by detecting the pixels with strong features for each group, employing morphological processing to improve the chances that each region, which is called a region-of-candidates (ROC), corresponds to only one target. We then combine the ROCs for all groups by a voting procedure that maximizes the number of pixels in these regions. Finally, we extend the surviving ROCs using prior information so that target pixels lacking strong optical features are included within the appropriate ROC. The extended ROCs are deemed to be ROIs.


Figure 2. Our hierarchical region-of-interest detection algorithm uses five steps: (1) a priori information processing; (2) image downsampling; (3) region-of-candidates (ROCs) detection for each prototype group; (4) ROC arbitration; and (5) ROC area extension to form regions of interest (ROIs).
 

Simulations on low-quality natural imagery show that our method offers a high probability that each ROI corresponds to one, and only one, target, and that it correctly incorporates target pixels that lack strong optical features, all while being computationally tractable. Our detection method generated Figure 3, which shows that HROID can detect a variety of targets of differing size in low-quality imagery.


Figure 3. An aerial image and the detected ROIs (boundaries shown in black).
 

Overall, our algorithm exhibits an excellent tradeoff between detection and false-alarm rates. It is ideally suited to ROI-based compression applications, and to preliminary ROI detection that facilitates fast and efficient target recognition, and sensor and asset cueing.

This work was supported by General Dynamics C4 Systems, and in part by the National Science Foundation under grant ECS-0002098.


Authors
Jennie Si,  Huibao Lin
Electrical Engineering, Arizona State University
Tempe, AZ

Jennie Si received her BS and MS from Tsinghua University in Beijing, China, and her PhD from University of Notre Dame. She is a professor in the Department of Electrical Engineering at Arizona State University. She has over 100 publications in journals and refereed conference proceedings.

Huibao Lin received his BS and MS from Peking University, Beijing, China, in 1997 and 2000, and his PhD from Arizona State University in 2005, all in Electrical Engineering. He was a graduate student at Arizona State University when this work was completed. Currently, he is a senior video development engineer at The MathWorks, Natick, Massachusetts.

Glen Abousleman
Compression, Communications & Intelligence Laboratory, General Dynamics C4 Systems
Scottsdale, AZ

Glen P. Abousleman received his BS and MS from The University of New Mexico in 1988 and 1990, and his PhD from The University of Arizona in 1994, all in Electrical Engineering. Currently, he is the director of the Compression, Communications & Intelligence Laboratory at General Dynamics C4 Systems. He is also an adjunct professor with the Electrical Engineering Department at Arizona State University.