Human visual perception and dissimilarity
Quantification of spatial structures, in particular from a geometric and morphological (structural) point of view, has application in fields such as materials science and engineering, pharmaceuticals and cosmetics, medicine, and biology. For more than 10 years, we have been developing image processing methods and algorithms dedicated to the analysis of images of the corneal endothelium of the eye. Our current goal is to bring together all our previously successfully tested methods and algorithms to develop packages for the community of optometrists and ophthalmologists who examine the cornea in vivo in human patients as well as for the community of ophthalmologists and cornea banks that carry out surgical corneal transplants. The most common request we receive from our ophthalmologist colleagues is for a means of measuring corneal cell density as accurately as possible. These cells are responsible for the transparency of the cornea. Below a certain density, blindness ensues. A second request concerns how to evaluate the shapes and the sizes of these cells.
Accordingly, we proposed a method, based on so-called specular microscopy imaging (see Figure 1), that consisted of segmenting (i.e., detecting the contours of) cells. To our mind, this technique represented an improvement over earlier methods. But to publish our work, we still had to evaluate our segmentation results. We asked two experts to manually draw the contours of the cells, as we wished to compare our results with their ground truth. To our surprise, the automatic comparison functions classically used for this purpose returned anomalous results, even when comparing the experts together.

A digital image is a matrix of elementary objects called pixels (picture elements) that contain color or grayscale intensity information. A segmentation process partitions an image into coherent sets of pixels. The process is basically one of detecting objects in an image by creating a binary image (the object is isolated from the background). In the case of corneal endothelial cells, the algorithm detects the borders of the cells (see Figure 1). In this case, if we want to compare our results with other methods, we must use a specific function that takes two segmentation results and outputs a numerical value: 0 if the segmentations are identical, and a positive value if they are different. Mathematicians call this kind of function a distance function, or metric.
For a mathematician, a distance function (also called distance for short) d of two variables obeys four axioms. To illustrate them, let us suppose we wish to measure the distance between cities: the distance d from Paris to Paris is 0 (axiom of identity); if we walk for a distance of 0 from Paris, we will still be at the same place (axiom of separation); the distance from Paris to London is the same as the distance from London to Paris (axiom of symmetry); and the path is longer from Paris to London via Brussels than going straight through the Channel (axiom of triangle inequality).
Based on these axioms, image processing specialists have proposed many different distances2 based either on detected contours (as with the endothelial cells) or on detected regions (i.e., different approaches that have the same goal). They try to classify pixels as detected or not detected when compared with a reference segmentation result. Among these methods, Pratt's figure of merit3 is probably one of the most used, but we might also cite the Dice coefficient4 or the Hausdorff distance.5, 6 For evaluating a specific segmentation method, we generally employ a reference image (as ground truth, usually supplied and manually drawn by experts in the domain) and use the distance to evaluate the method.7–10
Now, we know that these distance functions compare images together and result in a numerical value. That is, they compare pixels at the exact same locations in the two images. But what happens if the information in one image is translated by only one pixel? The numerical value would be totally different. Does our expert then agree with this numerical value? Is the one-pixel translation a modification he considers to be important? In general, the answer is no. He wants to consider these segmentations as identical. In fact, if we think about the four axioms of distance functions, we see that the principle of separation depends on some kind of scale. But does the human visual perception system also work like a distance function? Evidence suggests that, in terms of comparison, it does not (see Figure 2).

The human visual perception system does obey the identity axiom, since two identical objects cannot be considered as different. In contrast, two different objects can be considered as similar. Thus the human visual system violates the separation axiom (see Figure 2). The symmetry axiom does not hold, either. Depending on the order of observation, the results of a single comparison might be different. In fact, the interpretation is strongly influenced by the context. Finally, the triangle inequality definitely does not apply for the human visual system (see Figure 3). In mathematics, such a distance function is called a pseudo-quasi-semi-metric, each term meaning the loss of one axiom. In other words, ‘pseudo’ means that the separation axiom does not hold, ‘quasi’ that symmetry is violated, and so forth. This ‘dissimilarity criterion’ is based on a similar notion defined by psychologists. Details are provided by Tversky.7, 8

Here, we propose a new dissimilarity, denoted ∊, that is tolerant to small spatial variations (see Figure 4). This tolerance has been introduced by mathematical morphology (a field of mathematics used in image analysis that is based on the geometry of structures). Moreover, we have proved9 that ∊ is more tolerant than other classical criteria (e.g., figure of merit, Jaccard index10). It also has elegant mathematical properties, in particular convergence, that validate its definition in the case of continuous and discrete settings.

The precise definition of the ∊ dissimilarity criterion with tolerance ρ, applied to a segmented image X and a reference image (i.e., ground-truth image) R, is:
where N is the disk of radius 1 (as a structuring element), # is the number of pixels within the set, and ⊕ is the Minkowski addition symbol. Although it appears complex, this criterion simply means that every pixel present in a tube of tolerance around the reference will be considered as correct (see Figure 5).

With the ∊ dissimilarity criterion, we are now able to compare the results of our segmentation method with the results of other methods, vis-à-vis the ground truth provided by the experts. The tolerance used in this criterion considers that two experts' segmentations are identical. Moreover, we can tune our own method (which has a number of parameters, like the size of filters) to select the best parameter values.
Another application of this criterion is the evaluation of registration processes. Registering two images consists of stitching them together (i.e., creating a panorama) by superposing one on the other. Such a process employs a transformation that converts one image into the coordinate system of the second. The transformation somehow distorts the structures present in the images, and a human observer will visually notice the superposition (or not) of these structures. A way of evaluating registration processes is to compare the registered segmentation images together11 (see Figure 6).

In summary, we wished to compare the results of segmentation processes together, but we noted that the numerical results were not really in agreement with what experts expected. We proposed a criterion, tolerant to small variations (translations, rotations, some kinds of noise) that performs similarly to the human visual system. It can be used to compare image segmentation results and to tune the algorithms. We believe the ∊ dissimilarity criterion to be a practical tool for evaluating segmentation. It is mathematically well-founded, operates both in continuous and discrete settings, and accords with the principle of human visual perception, thus yielding better results than other criteria. In the future, we intend to define a tolerant measure to compare grayscale images. We are also working on the same kind of criterion adapted to classifiers and ROC (receiver operating characteristic) measures.
Yann Gavet received his PhD in computer sciences in 2008. He works in the field of image processing and analysis, and particularly on mosaic structures. He is the coordinator for the ANR (French National Research Agency) TecSan project CorImMo 3D on microscopic imaging and morphology of the human corneal endothelium.
Jean-Charles Pinoli received his DSc in mathematics and is a full professor. His research interests and teaching focus on image and pattern processing, analysis, and modeling.