Supervised image classification based on statistical machine learning

Ryuei Nishii

Combining statistics with machine-learning represents an effective and efficient approach to contextual image classification.

6 March 2007, SPIE Newsroom. DOI: 10.1117/2.1200612.0449

Classification of land covers—for example, trees, soil, and water—is an important problem in remote sensing of the Earth's surface. Contextual image classification is a form of image processing in which pixels are classified by learning the feature vectors (a numerical means of representing a feature) and the adjacency relationships of the pixels. Paradigms of supervised learning for classification, based on ‘training data,’ have traditionally relied mainly onstatistical methods. Currently, the trend is toward machine learning, using approaches such as artificial neural networks (ANNs) and support vector machines (SVMs). Adaptive boosting (AdaBoost),^{1} amachine-learning technique developed in 1997 by Freund and Schapire, and its variants have been successfully applied to various situations in pattern recognition. Here, we apply AdaBoost to geostatistical image classification.

For contextual image classification, Markov random fields (MRFs) are known to approximate distributions of land-cover categories well, and MRF-based classifiers outperform others. But this statistical method requires substantial computational effort. An alternative is Spatial AdaBoost,^{2} which represents a fusion of statistics and machine learning. Empirical studies have shown that Spatial AdaBoost consumes much less CPU time than MRF-based methods^{3} without sacrificing performance.

AdaBoost and Spatial AdaBoost

For binary class problems, AdaBoost assigns labels {+1,-1}. Suppose that a multivariate feature vector x is categorized by classifier F(x) according to its signature. Let F be a set of weak classifiers f(x) taking f(x) = ±1 only, and let {(x_{i},y_{i})|i ∈ D} be training data observed at a set of n numbered pixels D = {1,2,…n} on Earth. Then, classifier F(x) is evaluated by the empirical exponential risk:

Risk (1) is sequentially minimized by a linear combination of classifiers in F, and classifier with () is finally derived. A simple procedure for tuning ß_{t} is known.^{4}

In the case of contextual classification, let U_{r}(i) = {j = D|d(i,j) = r}} be a neighborhood of pixel i with radius where d(i,j) denotes the distance between centers of pixels i and j (see Figure 1). Let p(k|x) be a posterior probability given x such that the true label y of x is k = ±1. Define the average of the logarithm of posterior probabilities in neighborhood U_{r}(i) by

Figure 1. Neighborhoods of pixel i with radius r.

Spatial AdaBoost then uses a contextual classifier F(x) = ß_{0}f_{0}(k|i) + ß_{1}f_{1}(k|i) + … + ß_{r}f_{r}(k|i), where ß_{t}'s are constants determined by sequentially minimizing risk (1).

The proposed method can easily be extended to the multiclass case. The extended method can in turn be applied to multispectral imaging—see Figure 2(a)—using three categories. The data are simulated by four-variate independent Gaussian distributions with three mean vectors in respective categories and a common unitvariance-covariance matrix. Figure 2(b) is the image classified by log posterior f_{0}, or, equivalently, by the linear discriminant function (LDF). It is clearly very poor. As radius r becomes large, the classified image approaches the true image. Figure 2(f) shows the best result.

Figure 2. True and estimated labels by LDF and Spatial AdaBoost with radius r. LDF: Linear discriminant function.

Conclusion and outlook

Spatial AdaBoost is based on posterior probabilities, taking prior experience into account. Normally, posteriors are estimated by statistically modeling feature vectors. Other methods, for example, probabilistic SVM^{5} and different types of neighborhoods of pixels could also be used, which shows the versatility of the method. Our research group has investigated Spatial AdaBoost from several vantages, and compared it with MRF-based methods.^{6–10} In future, we plan to test the performance of Spatial AdaBoost with problems involving 200-and-higher-dimensional feature vectors to represent land cover. Other challenges consist in applying AdaBoost to estimate the proportion of land cover and to unsupervised classification, that is, without the need fortraining data.

Ryuei Nishii

Faculty of Mathematics, Kyushu University

Fukuoka, Japan

Ryuei Nishii is a professor of the Faculty of Mathematics, Kyushu University, Japan. He is an associate editor of IEEE Transactions on Geoscience and Remote Sensing, and editor of the Japanese Journal of Applied Statistics. In addition, for the last two years he has been a program committee member for, chaired, and contributed papers to the SPIE Remote Sensing conference.