Image classification is an active area of research with many applications, including content-based image retrieval, object recognition, and surveillance. Images can be classified based on the presence of a specific object or on certain global properties of object patterns present in the image. Size and shape descriptors are computed from the contour or the area of the objects. Often, these descriptors require considerable computing time. Even worse, many of them are not scale and rotation invariant, which further complicates processing.
We present an efficient method for classifying gray-scale images, based on both size and shape properties of a single object, or on patterns of objects present in the image. Area (defined as the number of pixels) and elongation are used as size and shape descriptors, respectively. The invariance of these elements to translation, scale, and rotation means that all of the desired objects are recognized in one single pass of our algorithm. We compare the classification performance of our method with existing methods.Pattern spectra
Mathematical morphology is a field of research where mathematical concepts are used to describe images and define operators on images.1 Commonly, operators using so-called structuring elements (SEs) are used—small images containing part of the object of interest. The image is then ‘probed’ with an SE such that image areas where the SE ‘fits’ are preserved, while areas where it does not are removed. For each given basic shape, the input image is filtered with SEs of that shape at increasing scales. Such a set of operators, called a granulometry, is used to compute a pattern spectrum—a histogram in which each bin contains the amount of image detail that is associated with the scale and shape of the corresponding SE.2,3 The amount of detail that was removed by each operation at a certain scale is stored in a separate (increasing) bin of a table. The pattern spectrum is then obtained by taking the derivative of that table by size. Finally, the spectra are combined into a single feature vector. The decision-tree classifier C4.54 is then used to evaluate the vectors.
Our approach is based on another technique from mathematical morphology called connected filtering. We demonstrate that its classification performance is equal to or better than the best existing method using SEs. Moreover, the computation time of our method does not depend on the number of scales and shapes used. Our approach has also been compared with several other methods for image classification,5,6 such as a combination of gray-level co-occurrence matrix and Gabor wavelets. For a set of diatom (algae) images, it was found that our method also outperforms these.
First we define the basic concept of connected filter. To that end, the notion of a flat zone is required. A flat zone Lh(f) of a gray-scale image f is defined as a connected region of constant gray-level h, of which no neighbors have h. It is a connected image region of constant gray value which has maximal size. At each h there may be multiple flat zones, which are denoted by Lkh(f), with k being some index variable. Similarly, a peak component Pkh(f), where k runs over some index set Ifh, is defined as the kth connected component or ‘grain’ of the threshold set of image f, defined as
A connected filter (or operator) merges flat zones, or changes their gray level, but does not split or deform them. (A precise mathematical definition, as well as a detailed discussion of the algorithm and the experiments, is available for the interested reader.7) Connected filters are shape preserving because they never introduce new edges in images. In our approach, we use a type of connected filter known as an attribute filter.8 To decide which flat zones should be removed or preserved, a criterion is used that is defined in terms of attributes of flat zones. An example is the area opening, which removes components with an area smaller than a certain threshold λ.
The Max-tree representation is an efficient algorithm to perform attribute filtering. In a Max-tree, the root node represents the set of pixels belonging to the background—those with the lowest intensity in the image. The size and shape attributes of the peak components in the image are stored in corresponding nodes of the tree. Each node has a pointer to its parent, so that the nodes corresponding to the components with the highest intensity are the leaves (hence the name Max-tree). Conversely, a tree in which the leaves correspond to the minima is called a Min-tree.
The attributes we use are area and elongation, describing a size and shape property of each peak component, respectively. A discussion of the computation of the area and the construction of the Max-tree is available elsewhere.9 We use the first moment invariant of Hu10 as the elongation attribute. Once the Max-tree with its attributes has been constructed, our method computes the contribution for each node and stores that at location (i,j) in the pattern spectrum. The values i and j are binned versions of the elongation and area attribute values, respectively. The contribution is the product of the area and the gray-level difference between the current node and its parent node. This 2D pattern spectrum is then mapped lexicographically into a vector.
Similarly, a second vector is computed from the 2D pattern spectrum derived from the Min-tree. These two vectors are then concatenated to a single feature vector. The decision-tree classifier C4.5 with bagging and cross-validation was used to evaluate the classification performance of both our method and the existing methods using structuring elements.11,12
Table 1 shows the classification performance of our method and the best competitor for four image libraries: the Brodatz texture set; the Columbia Object Image Libraries COIL-20 and COIL-100; and the set of 781 diatom images from the Automatic Diatom Identification and Classification (ADIAC) project.5,13
Table 1. Classification performances (%) for the best two methods on all data sets. The corresponding standard deviations are included in parentheses.
The influence of orientation on the classification performance of some existing methods using SEs—SE univariate (UV), SE Euclidean, and SE bivariate (BV)—and of our Max-tree–based method is shown in Figure 1. We used two versions for our approach: with normalization (Max-tree norm) and without normalization (Max-tree). As shown, the methods using structuring elements perform poorly with objects at different orientations due to their lack of rotation invariance. The computation time needed for these methods on a 2.8GHz Pentium 4 PC is shown in Figure 2 for the ADIAC and Brodatz image sets. It is clear that the computation time of the SE methods, contrary to our method, depends on the number of size and shape classes used, i.e., on the resulting length of the feature vector.
Figure 1. Performance on rotated diatom images, trained using an aligned image set. Left: SE-based methods. Right: Max-tree method.
Figure 2. Computation time per pixel vs. length of feature vector for the binned methods.
Methods using pattern spectra for the computation of feature vectors are highly suitable for image classification. In this paper we presented our recently developed multiscale and multishape morphological method for pattern-based analysis and classification of gray-scale images using connected operators. An efficient implementation was achieved by using the Max-tree algorithm. Our method performs a global analysis of the patterns in an image. This type of information can be of use in many other applications, such as content-based image retrieval, soil analysis,14 and the classification of pollen grains,15 where all grains have a more-or-less round shape, but with patterns that are not always symmetric.
We compared our new method with existing methods using structuring elements. For image sets where the images are symmetrical or manually aligned to have the same orientation, the best SE competitor was never significantly better than our method, and in one case was significantly worse (Brodatz). When rotation invariance is desired, as in the case of images with different orientations, our method is preferable over those based on structuring elements, since the latter are not invariant to rotation. Although rotation invariance can be approximated using linear structuring elements at many orientations, this leads to a sharp increase in computation time, which is not desirable when large sets of images are used.
A further advantage of our method is the separation of size and shape information, which allows a class of image components with similar shape but varying size to be treated as one class for filtering or analysis. Furthermore, when when computing a large number of size or shape classes, SE methods need to filter an image for every class used, while the time needed to process a pattern spectrum using our method is independent of the size of the spectrum. We obtained a speed gain between five- and ninefold per image set in the current setting. Our method is significantly less sensitive to noise than any of the compared SE methods.
In future work we will investigate other attributes such as the moment invariants of Hu16 and the so-called affine moment invariants of Flusser and Suk,17 which may further improve the performance of our method. Texture segmentation of images requires local texture measures and adaptations of our approach.
Institute of Mathematics and Computing Science
University of Groningen (RUG)
Groningen, The Netherlands
Lunar and Planetary Institute
Erik R. Urbach obtained an MSc degree in computer science from the Institute of Mathematics and Computing Science, RUG, in 2002, where he now works on the Connected Morphological Operators for Scale and Shape Spaces project toward his PhD. The prime areas of his research are connected filters, multiscale and multishape analysis, image classification, and texture analysis.
Jos Roerdink, Michael Wilkinson
Institute of Mathematics and Computing Science
Groningen, The Netherlands
Jos B. T. M. Roerdink received his MSc in theoretical physics from the University of Nijmegen, the Netherlands, in 1979, and has written several papers for SPIE conferences. He holds a chair in scientific visualization and computer graphics at the RUG, where he is a full professor of computing science. His current research interests include biomedical visualization, neuroimaging, and bioinformatics.
Michael H. F. Wilkinson obtained an MSc in astronomy from the Kapteyn Laboratory, RUG, in 1993, and a PhD in computer science at the Institute of Mathematics and Computing Science (IWI), RUG, in 1995. He is currently an assistant professor working on computer vision, and in particular multiscale mathematical morphology and connected filters, at the IWI.
13. H. du Buf, M. Bayer, S. Droop, R. Head, S. Juggins, S. Fischer, H. Bunke, M. Wilkinson, J. Roerdink, J. Pech-Pacheco, G. Christobal, H. Shahbazkia, A. Ciobanu, Diatom identification: a double challenge called ADIAC, Proc. 10th Int'l Conf. Image Anal. Process., pp. 734-739, Venice, 1999.