The dramatic cost-reduction in imaging detectors has generated considerable interest in sensor and image fusion. In the previous few decades, image sensors operating in areas of the electromagnetic spectrum beyond the visible spectral region typically have been used only in special instrumentation such as military and remote-sensing systems that could accommodate high cost and complexity. As the cost of IR sensors has dropped, these devices are finding applications in commercial areas such as transportation, security, and computer vision. Such systems often image over a range of more than one spectral band. If more than one band is being considered for use in a system, it makes sense to consider a system that fuses images to display multiple bands simultaneously. The goal of sensor and image fusion is to display images from multiple spectral bands in an intuitive manner that improves the viewer's scene comprehension and emphasizes relevant information in ways a single-image band cannot.
Image fusion is distinctly different than multispectral or hyperspectral imaging, in which a system captures distinct images of the same scene over anywhere from two to several hundred discrete spectral bands. Such systems often do, however, have an additional human-factors requirement to display some form of image fusion to confirm detections. Image fusion is also distinct from data fusion, which involves the combination of abstract objects (such as "plane" or "ship") identified by different sensors including radar, signal intelligence, and electro-optical assets. Image fusion fundamentally operates on a lower level of interpretation, relying only on spatial data generated by a collection of imaging sensors.
In the simplest form of sensor fusion, the user boresights two or more separate cameras and digitally combines their outputs through software. The challenge is that such processing must additionally register the images. Despite tight mechanical tolerances, misalignment at the pixel level invariably occurs in these systems because of parallax and subtle differences between optics. An improved configuration uses a splitter mirror to eliminate the effects of parallax but not the differences in lenses. A more integrated system, similar to the splitter system, features a common optic and two separate focal-plane arrays (FPAs). A fully integrated system uses a single optic and a multiband FPA, creating a single camera system. varying the spectral band
The primary limitation of single-band thermal IR imaging systems is poor contrast between objects and backgrounds, which can make object detection in IR images problematic. Due to the underlying thermal processes, a target may at times appear to have positive contrast and at other times negative contrast relative to a background. At certain times of the day, the target may pass through a point of no contrast. Such variations introduce severe problems for even simple target detection.
A good choice of spectral bands in an image-fusion system is characterized by high atmospheric transmission and accessible sensor technology. The characteristics of the various bands differ in several important ways and affect subsequent fusion. The number of band/sub-band combinations grows with sensor choice and can vary from a common color-television camera to an exotic long-wave IR (LWIR) hyperspectral sensor. Between these extremes are systems that fuse a single IR band with a low-light-level visible imager.1 Such a configuration substantially improves system performance, hedging the risk of one band performing poorly by including a second band that may perform well. Such complementary situations occur with surprising frequency in nature and motivate the current placement of both sensor types in many imaging systems, albeit with separate monochrome displays for each band.
The photon flux incident on a ground-based sensor fusion system will include both solar reflected and thermally self-emitted components. In the LWIR spectral region, the dominant mechanism is thermally self-emitted photons, best modeled using a temperature-driven blackbody spectra modulated by the object's surface emissivity. In the visible, near-IR (NIR), and short-wave IR (SWIR) spectral regions, the dominant mechanism is reflected solar photons modulated by a surface reflectivity. (Reflected light can also be significant at night, where certain sensors operating in the visible and NIR bands are now sensitive to ambient starlight.) The mid-wave IR flux distribution, however, contains a combination of both thermally self-emitted and solar-reflected photons.2
In the visible spectral region, it is usually quite challenging to differentiate among various types of vegetation and military targets, as their spectral reflectivities are fundamentally similar; for example, plant chlorophyll and camouflage paint both look green. Traditionally, hyperspectral imaging systems have used many narrow bands to detect small features or peaks. Discrimination between such materials then requires substantial measurement, usually under controlled settings with considerable exposure time. The conditions of many surveillance and situational awareness scenarios do not allow for such measurements because they have wide spectral coverage.
Enter image fusion. In wave bands longer than the visible, one often finds subtle differences in spectral reflectivity but not sharp peaks. Nevertheless, material can instead be quickly differentiated based on broadband spectral signatures instead of any single narrowband measurement.
The methodology for choosing spectral bands in an image-fusion system begins with an a priori determination of what targets need to be distinguished from anticipated backgrounds. A quick investigation of the pertinent spectral reflectivities from the visible to the LWIR often finds substantial spectral phenomena that can't be easily concealed. The instrument designer then chooses wavelength bands that maximize these spectral differences between the constituent sensors. displaying data
Consider human color vision.3 For purposes of color vision, the eye actually captures three distinct images: one each from the red, green, and blue photoreceptor types called cones. From these separate images, the brain perceives a single color image responsive to each of the three constituent color cones. Given that the eye can discriminate roughly 100 simultaneous intensities in any single color band, this perceptual independence offers sensitivity to roughly one million colors and provides a method to powerfully discriminate objects from backgrounds, given constituent detectors with otherwise limited sensitivity.
An LWIR (a), mid-wave IR (b), and visible image (c) can be combined for a three-color composite (d) or a monochrome fusion (f). Using only the mid-wave and visible produces a two-color composite (e).
However, human color vision is psychologically perceived by three transformed channels: a black-and-white luminance sensation derived by a weighted average of the three color bands, along with two opponent-color chrominance sensations derived by weighted differences between these bands.
Extending this process to IR multispectral sensor fusion, we begin with a multidimensional image cube, where each pixel has an n-tuple of intensities measured by the n separate, spatially registered detectors. We next seek to choose an optimal projection of this dataset onto a luminance channel and two chrominance channels. Finally, these projections are rendered on some display media, such as a high-end graphics monitor, or a four-ink press such as the one on which this article is published.
A simple projection method for three-band sensor fusion is to consider the distribution of pixel values in 3-D space to be represented by a prolate spheroid extending along the principal component direction. This principal component direction is taken to be the luminance direction. The plane orthogonal to this direction is then spanned by two chrominance projections. This transform space allows luminance and chrominance to be processed separately. For example, if the three bands are highly correlated, then pixel values tend to lie along the principal coordinate axis; uncorrelated data will have a more spherical shape. Projected data can be renormalized in the transform space to achieve effective luminance-dynamic range management, as well as color contrast enhancement suitable for final visualization (see figure).
Many fusion applications involve only two sensors. Representation of the resultant 2-D color space can be achieved with the same luminance projection but with only one opponent color projection, such as yellow-blue or red-cyan. One common alternative is to reduce color fusion to monochrome fusion. A unifying approach is to completely suppress projected color information and only retain a luminance projection.
A broader goal of color fusion is to track diurnal and seasonal variations of the constituent sensor imagery, rendering a color image responsive to the underlying emmisivity or reflectivity and not any particular intensity level due to solar illumination or scene temperature. The ability to determine surface emissivity and reflectivity regardless of changes in the environment is called color constancy.4 oe
1. A. Toet et al., Opt. Eng. 28 , 1989, pp. 789-792.
2. D. Scribner, P. Warren, and J. Schuler, Journal of Machine and Computer Vision, Vol. 11, pp. 306- 312, Springer-Verlag, 2000.
3. G. Healey, S. Shafer, and L. Wolff, Physics-Based Vision: Principles and Practice, Color, (Jones and Bartlett, Boston, 1992).
4. L. T. Maloney and B. A. Wandell, J. Opt. Soc. Am. A, Vol. 3, pp. 1673-1683 (1986).
Dean Scribner, Jonathan Schuler, Penny Warren, Grant Howard, Richard Klein
Dean Scribner, Jonathan Schuler, Penny Warren, Grant Howard, and Richard Klein are research physicists with the Naval Research Laboratory, Code 5636, Washington, D.C.