With extraordinary advances in sensor technology, numerous imaging devices and techniques, such as visible cameras, IR cameras, and magnetic resonance (MR) imaging, have been developed for military and civilian applications. For a given scene, different devices often provide complementary information. MR can provide detailed information about soft tissue, but not dense structures. By contrast, computed tomography (CT) reveals dense structures, such as bones, but not soft tissue. Fusion can integrate complementary information from different devices into a single image to provide a complete and accurate description of a scene. For example, if an MR image is combined with one generated by CT, the soft tissue and bones are well visualized, which helps in making diagnoses.

Because natural scenes exhibit a multiscale structure, multiscale-transform methods, such as discrete-wavelet transform (DWT), are widely used.^{1} The major step in these techniques is to decompose source images through a multiscale transform and then to combine the decomposed coefficients at corresponding positions according to specific rules. The fused image is constructed through the inverse transform of the coefficients. The coefficients can characterize salient features, i.e., complementary information, such as edges and lines. However, each multiscale transform is generally suitable for describing only certain features. Because the goal is to integrate salient features from multiple images, it is important to extract them effectively and completely. We previously^{2, 3} proposed an approach that uses sparse coefficients as salient features, calculating them by means of an overcomplete dictionary. Unlike multiscale transforms, the overcomplete dictionary can be trained from a set that effectively represents all features, such as edges, lines, corners, and ridges.

The joint sparsity model (JSM) is useful for recovering sparse signals.^{4} In the JSM, multiple signals provided by different sensors for the same scene form an ensemble. All signals in an ensemble have a common sparse component, and each signal has its own unique sparse component.^{4} The common and unique sparse components can be determined by

where Θ=*[α*_{c}^{T};α_{1}^{T};α_{2}^{T};…;α_{Λ}^{T}_{]}^{T}_{ }consists of the common sparse component (*α*_{t}) and unique sparse components (*α*_{1}*;α*_{2}*;…;α*_{Λ}), *X*=[*x*_{1}^{T}; x_{2}^{T}; …; x_{Λ}^{T}]^{T} denotes the ensemble, Λ is the number of source images, and the matrix **D** consists of an overcomplete dictionary **Φ**. Because the JSM can effectively separate the common and unique components of multiple source images, we propose using it as the basis for a method of multimodal image fusion (see Figure 1). The sparse coefficients are the salient features, and the individual components represent the complementary information.

**Figure 1.** Diagram of the proposed method. JSM: Joint sparsity model. *I*_{1}, *I*_{2}: Source images. (α_{1};α_{2}): Unique sparse component. Φ: Overcomplete dictionary. *I*_{F}: Fused image. *x*_{1}^{j}, *x*_{2}^{j}: Column vectors.* x*_{1}^{j}, x_{2}^{j}: Column vectors minus corresponding mean value. x_{F}^{j}: The jth estimated patch corresponding to the jth patches extracted from the two source images.

To represent local information, the new method focuses on patches. We extract two *√n* × *√n* patches from the source images I_{1}; I_{2} and order them as the column vectors x_{1}^{j}, x_{2}^{j} Subtracting the mean values *m*_{1}^{j}; *m*_{2}^{j} yields* x*_{i}^{j}(*i*=1; 2). Using the JSM, we obtain the common sparse component *α*_{c}^{j} and the unique sparse components *α*_{1}^{j};*α*_{2}^{j}. Then, summing the common component Φ*α*_{c}^{j}, the unique components Φ*α*_{1}^{j}; Φ*α*_{2}^{j}, and the average of the mean values, we obtain the estimated *j*th patch *x*_{F}^{j} of the fused image. Finally, we calculate the fused image I_{F} by averaging the estimated patches. Two rules, ‘max-abs’ and ‘average,’ are commonly used in multiscale-transform- and sparse-representation-based methods. However, in the proposed method, we use the JSM to accurately calculate the complementary information. Because our approach can directly determine the fused image from the separated complementary information, it can avoid problems associated with rule design. In addition, by accurately separating the complementary information, we can completely merge the information from different source images into a single image.

We compared our JSM-based approach with four multiscale-transform-based methods, including DWT, stationary-wavelet transform, dual-tree complex-wavelet transform, and nonsubsampled-contourlet transform. In addition, we compared the proposed method with a sparse-representation-based technique known as simultaneous orthogonal matching pursuit.^{3} Figure 2 shows the visible and IR source images^{5} and the fused images produced by the different methods. The visible image clearly shows the road and signboard in the background, while the IR image depicts the outlines of the pedestrian and cars: see Figure 2(a) and (b). On visual inspection, we observe that the proposed JSM-based method produces fewer black shadows over the signboard than the other techniques: see Figure 2(c--h). Furthermore, the JSM-based method produces a smaller region of black artifacts around the pedestrian than the other methods---see Figure 2(c--h)---effectively preserving complementary information. The Q^{AB/F} index,^{6} which is used to objectively assess fused images, also shows that the proposed method generates the best results.

**Figure 2. **Visible and IR source images and the results of different fusion methods. DWT: Discrete-wavelet transform. NSCT: Nonsubsampled-contourlet transform. SWT: Stationary-wavelet transform. DTCWT: Dual-tree complex-wavelet transform. SOMP: Simultaneous orthogonal matching pursuit. JSM: Joint sparsity model.

Figure 3 presents additional results for fluorodeoxyglucose positron emission tomography (PET-FDG) and T1-weighted MR. Treating the PET-FDG image as colored, we transform it into intensity-hue-saturation (IHS) space. Then we combine the intensity component of the PET-FDG image with the T1-weighted MR image to create a new intensity component. We obtain the final results through the inverse IHS transform of the new intensity component and the hue and saturation components of the PET-FDG image into red-green-blue space. As shown in Figure 3, our proposed method preserves details from T1-weighted MR more effectively than other approaches.

**Figure 3.** Fluorodeoxyglucose positron emission tomography (PET-FDG) and T1-weighted magnetic resonance (MR) source images and the fused results of different methods.

In summary, we have described a JSM-based method that is superior to existing multiscale-transform- and sparse-representation-based methods. The new technique preserves details more effectively by accurately extracting and integrating complementary information. The experimental results suggest that our approach is applicable to monitoring safety in cities and making medical diagnoses. Future work will focus on developing a technique based on the sparse-representation structure.

Haitao Yin, Shutao Li

Hunan University

Changsha, China

Haitao Yin received a BS and MS in applied mathematics from the College of Mathematics and Econometrics (2007 and 2009, respectively). He is currently pursuing a PhD in the College of Electrical and Information Engineering. His research interests include image processing and sparse representation.

Shutao Li received a BS, MS, and PhD in electrical engineering (1995, 1997, and 2001, respectively). He is currently a full professor at the College of Electrical and Information Engineering. His professional interests include information fusion, pattern recognition, and image processing.

References:

1. G. Pajares, J. M. de la Cruz, A wavelet-based image fusion tutorial, *Pattern Recognit. 37*, no. 9, pp. 1855-1872, 2004.

2. B. Yang, S. Li, Multifocus image fusion and restoration with sparse representation,* IEEE Trans. Instrum. Meas. 59*, no. 4, pp. 884-892, 2010.

3. B. Yang, S. Li, Pixel-level image fusion with simultaneous orthogonal matching pursuit,

*Inf. Fusion* 13, pp. 10-19, 2012. doi:

10.1016/j.inffus.2010.04.001
4. M. F. Duarte, S. Sarvotham, D. Baron, M. B. Wakin, R. G. Baraniuk, Distributed compressed sensing of jointly sparse signals, *Proc. 39th Asilomar Conf. Signals Syst. Comput*. 1537-1541, 2005.

5. J. J. Lewis, S. G. Nikolov, C. N. Canagarajah, D. R. Bull, A. Toet, Uni-modal versus joint segmentation for region-based image fusion, *Proc. 9th Int'l Conf. Inf. Fusion*, pp. 10-13, 2006.

6. C. S. Xydeas, V. Petrovic, Objective image fusion performance measure, *Electron. Lett 36*, no. 4, pp. 308-309, 2000.