Share Email Print
cover

Spie Press Book

Computational Models for Predicting Visual Target Distinctness
Author(s): Jose Antonio Garcia; Joaquin Fdez-Valdivia; Xose R. Fernandez-Vidal; Rosa Rodriguez-Sanchez
Format Member Price Non-Member Price

Book Description

The more a target stands out from its background, the easier it is to detect and the quicker it will be found. This book looks at two situations for predicting visual target distinctness by means of a computer vision model.

Book Details

Date Published: 8 February 2001
Pages: 212
ISBN: 9780819439963
Volume: PM95

Table of Contents
SHOW Table of Contents | HIDE Table of Contents
Preface / xi
1 Models of feature perception in distortion measure guidance / 1
1.1 Introduction / 1
1.2 Computational models for feature detection / 3
1.2.1 Image features from Laplacian zero-crossings / 3
1.2.2 Image features from phase congruency / 3
1.2.3 Image features from active sensors / 6
1.2.3.1 A data-driven multisensor scheme / 10
1.2.3.2 The activated sensors in the multisensor scheme / 11
1.3 Error measure guidance / 14
1.4 Experimental results / 17
1.4.1 Distinctness of targets and their immediate surroundings /
17
1.5 The role of integral features for perceiving image discriminability / 18
1.5.1 An original image quality model for predicting the visibility of the difference between a pair of images / 19
1.5.1.1 The spatial sensitivity function / 21
1.5.2 Applications / 22
1.5.2.1 Distinctness of targets and their immediate surroundings / 23
1.6 Conclusions / 24
2 Computational measures based on space-frequency analysis / 27
2.1 Introduction / 27
2.2 The multichannel organization of images / 29
2.2.1 Overview of approach / 29
2.2.2 Clumps of energy in the amplitude spectrum / 31
2.2.2.1 A spatial to 2D spatial-frequency / 32
2.2.2.2 A data-driven multichannel design / 35
2.2.2.2.1 The data-driven selection of bands of orientation / 35
2.2.2.2.2 The data-driven selection of radial frequency
channels / 36
2.2.2.3 The selection of the most activated sensors / 39
2.2.3 Bank of LoG-Gabors filters / 39
2.2.4 Activated filters in the bank / 41
2.3 Filtered-response based distinctness measure / 42
2.3.1 Selection of fixation points / 43
2.3.2 "Filtered Response" (FR) Distinctness Measure / 43
2.4 Integral-features based distinctness measure / 46
2.4.1 Preattentive Stage / 46
2.4.2 Integral Feature (IF) Distinctness Measure / 46
2.5 Experimental Results /49
2.5.1 Images, Apparatus, Subjects and Laboratory viewing conditions / 50
2.5.2 Predicting visual target distinctness / 51
2.5.2.1 Psychophysical target distinctness / 52
2.5.2.2 Search experiment / 52
2.5.2.3 Results 53
2.5.2.3.1 Psychophysical Target Distinctness / 58
2.5.2.3.2 Computational Target Distinctness / 58
2.5.2.3.3 Experiment 1 / 60
2.5.2.3.4 Experiment 2 / 60
2.5.2.3.5 Experiment 3 / 62
2.5.2.3.6 Experiment 4 / 63
2.6 Conclusions / 65
3 Defining the notion of visual pattern / 69
3.1 Introduction / 69
3.2 Material and methods / 70
3.2.1 Images / 70
3.2.2 The RGFF image representational model / 70
3.2.2.1 Selection of strongly responding filters / 73
3.2.2.2 Distance between filtered responses / 73
3.2.2.2.1 The best definition of integral feature for segregating visual patterns / 74
3.2.2.2.2 Congruence in integral features between two filtered responses / 75
3.2.2.3 Decomposition of the original reference image into its "visual patterns" / 77
3.2.2.3.1 Clustering of activated filters / 78
3.2.3 Evaluation function / 84
3.2.3.1 Datasets / 85
3.2.3.2 Psychophysical target distinctness / 86
3.2.3.3 Computational target distinctness metric / 93
3.3 Results and discussion / 93
3.3.1 Experiment 1 / 95
3.3.2 Experiment 2 / 98
3.3.3 Experiment 3 / 99
3.3.4 Experiment 4 / 102
3.3.5 Experiment 5 / 102
3.4 Conclusions / 110
4 Information theoretic measures / 111
4.1 Introduction / 111
4.2 Basic axiomatic characterization / 114
4.3 Information conservation constraint / 118
4.3.1 Selective information gain / 119
4.3.2 Properties of the selective information gain / 121
4.4 Significance conservation constraint / 123
4.4.1 Compound information gain / 124
4.4.2 Properties of the compound gain / 125
4.5 Comparative study / 128
4.5.1 Images and datasets / 129
4.5.2 Psychophysical target distinctness / 130
4.5.3 Results and discussion / 130
4.5.3.1 Experiment 1 /130
4.5.3.2 Experiment 2 / 138
4.5.3.3 Experiment 3 / 138
4.5.3.4 Statistical accuracy / 141
4.6 Conclusion / 144
Epilogue / 147
A Comparison with other saliency models / 155
B Integral opponent-colors features / 159
B.1 Introduction / 159
B.2 Preattentive stage / 160
B.2.1 RGB to opponent-color encoding transform / 162
B.2.2 2D bank of LoG-Gabors design / 163
B.2.3 Activated filters from the bank / 164
B.2.4 Fixation points on each filter response / 165
B.3 Integration stage / 165
B.3.1 Integral Opponent-colors features / 171
B.3.2 Target distinctness on each activated filter / 172
B.4 Decision stage / 172
B.5 Applications / 173
B.5.1 Distinctness of targets and their immediate surrounds / 173
B.5.2 Distinctness of targets in noisy environments / 176
B.6 Conclusions / 178
C Forms of gain and divergence / 181
D Calculating derivatives / 183
References / 185
Index/ 198

Preface

Measuring target acquisition performance in field situations is usually impractical and often very costly or even dangerous. Thus it is of great benefit to have advance knowledge of human visual target acquisition performance for targets or other relevant objects. However, search performance inherently shows a large variance, and depends strongly on prior knowledge of the perceived scene. Therefore, a typical search experiment requires a large number of observers to obtain statistically reliable data.

Visual target acquisition is a complex process, and many factors involved are not yet fully understood. One thing is evident: the more a target stands out from its background the easier it will be to detect it, and the quicker it will be found. It is therefore likely that visual target distinctness is an important determinant of visual search performance.

Target saliency for humans performing visual search and detection tasks can be estimated by means of the difference between the image from the target- and-background scene and the image from the same background with no target. Thus, relevant computational models of early human vision typically process an input image through various spatial and temporal bandpass filters and analyze first- order statistical properties of the filtered images to compute a target distinctness metric. If they give good predictors of target saliency for humans performing visual search and detection tasks, they may be used to compute visual distinctness of image subregions (target areas) from digital imagery.

This book deals with two different situations in predicting visual target distinctness by means of a computer vision model. First, it is assumed that the structure of the target-and-background scene and the image without a target can be determined exactly. In this case the main problem is how to select relevant information into a limited attentional bottleneck. Thus, Chapters 1, 2, and 3 introduce various computational vision models for selecting significant information for perceiving target distinctness in this first case. In the second case, it may happen that the structure of the target and nontarget scenes cannot be determined exactly; therefore, the structure of the images should be characterized statistically by discrete probability distributions. Due to the availability of a large number of measures, we have to know what postulates and properties should be satisfied by an information measure, and then what is the amount of relative information between the respective distributions of the target image and the image with no target. Chapter 4 analyzes exactly these points.

Chapter 1 analyzes the relation between two different problems in computer vision: what a proper model for identifying significant stimulus locations in an image is, and the comparative performance of selective measures and pixel- by-pixel error metrics. The natural relation between both problems arises from the fact that a proper selection of significant locations in the target image might be used to guide its comparison with another image through any reasonable metric.

In this chapter, we study an approach to improve the correlation between the subjective rating and the MSE in which the differences of the images to be compared are computed upon locations at which humans might perceive features in the reference image--for example, line features or step discontinuities. A visual model for feature perception is used to measure distortion between the target image and the image without the target. The actual success of the resulting distortion measure would then depend on both the validity of the vision model and which error metric was used in the perceptual domain. The problem is then to select a metric for image discriminability that corresponds to the human observer's evaluation. How conjunctions of features can be incorporated into such a metric is the next subject of Chapter 1.

Chapter 2 studies a different approach to improve the correlation between subjective rating and the Mean Square Error (MSE). In this approach, the nontarget image to be compared with that of the reference is passed through an operator designated to compare the excitation levels of the non-target image to those of the target picture, with excitation levels being given by a set of active units tuned to particular orientation and spatial-frequency components. This chapter investigates the relationship between visual target distinctness in complex natural scenes measured by human observers, and two different computational visual distinctness measures computed from image representational models based on a selectively filtered images and statistical features.

The first measure computes the structural dissimilarity between two related images filtered by a bank of spatial frequency and orientation selective (LoG- Gabor) filters. The second measure may be described in terms of two different stages: a "preattentive" stage, in which the image is selectively filtered by a bank of 2D LoG-Gabor filters, and an "integration" stage in which we integrate and compare the separable representations (i.e., statistical structure) at attentional locations.

Up to this point in the interpretation of visual search tasks was the assumption that the detection of targets is determined by the feature-coding properties of low-level visual processing. Chapter 3 presents a new distinctness measure that is applied at a much higher level of image representation than feature detection at the level of perceived shapes or surfaces. Instead of assuming that such forms are simple or integral features (i.e., statistical structure at a particular scale), we think it more appropriate to regard them as "visual patterns" distinguished at an object or surface level. To make a distinction, a system for the automatically learned partitioning of "visual patterns" in the original reference image is given, based on a sophisticated, band-pass, filtering operation with fixed scale and orientation sensitivity. In this scheme, the "visual patterns" are defined as the features that have the highest degree of alignment in the statistical structure across different frequency bands. The analysis reorganizes the reference image according to a constraint of invariance in the statistical structure and consists of three stages: (i) pre-attentive stage; (ii) integration stage; and (iii) learning stage. The first stage takes the reference image and performs filtering with a set of LoG- Gabor filters. Based on their responses, activated filters, which are selectively sensitive to patterns in the image, are short-listed. In the integration stage, common grounds between several activated sensors are explored. The filtered responses are analyzed through a family of statistics. For any given two activated filters, the distance between them is derived from distances between their statistics. The third stage, the learning stage, performs cluster partitioning as a mechanism for learning the subspace of LoG-Gabor filters needed to partition the image data. A computational distinctness measure can then be computed from the images after they have been transformed into a new perceptual domain in which they are decomposed into their "visual patterns." The resultant model has perceptual access to "visual patterns," but not to filtered images or statistical features at a particular level of resolution. A main result will be the finding that this computational measure that applies a simple decision rule between segregated visual patterns relates to visual target distinctness as perceived by human observers.

It often happens that the structure of a certain scene cannot be determined exactly due to various reasons. Under such circumstances, the structure of the reference image and the input image can be characterized statistically by discrete probability distributions. Here we ask the following question: What is the amount of relative information between discrete probability distributions? Due to the availability of a large number of measures for this purpose, a question naturally arises about the criteria for the choice of the measure to be used in a particular investigation. For this goal, we have to know what postulates and properties should be satisfied by the information theoretic measure. It is therefore of great value to develop an axiomatic characterization of relative information for predicting visual target distinctness from 2D digital images. Chapter 4 addresses exactly this point.

Several experiments are performed to investigate the relationship between the different target distinctness measures and the visual target distinctness measured by human observers. First, a psychophysical experiment is performed in which human observers estimate the visual distinctness of targets in a database. The subjective ranking induced by the psychophysical target distinctness is adopted as the reference rank order. Second, the visual target distinctness is estimated by using the different measures between the original scene and an image of the same scene in which the target support has been artificially filled in with the local background. A relation is then established between the computational and the psychophysical target distinctness estimate, for each one of the measures.

Acknowledgements: It is a pleasure to acknowledge the significant and pervasive contribution of Dr. Alexander Toet to the discipline of target distinctness. His papers and technical reports at TNO Human Factors Research Institute in Soesterberg contained the basis of many of the experimental design used in this book. We are enormously grateful for the many hours Dr. Toet spent on reading our manuscripts and his suggestions for improvements. There is no way to thank him enough for such generous help. Thanks are also due to Dr. Javier Mart�nez-Baena for his computational assistance. Rick Hermann of SPIE Press provided the much needed focus and guidance that is required in the completion of a text. We are deeply indebted to an anonymous referee for suggesting several good ways to improve the quality of the initial manuscript, which we managed to implement only in part.

We are grateful for the permission to reproduce figures and text of some of our papers, to Elsevier Science Ltd., IEEE Computer Society, Pattern Recognition Society and SPIE--The International Society for Optical Engineering. Thanks are due to TNO Human Factors Research Institute for providing us with the image data, search time, and cumulative detection probabilities from search experiments made during the DISSTAF field test. Figures 3.1, 4.1-4.6 are c 1999 (2000) IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. The book was typeset using LATEX software. The implementation used was running under Linux. Camera-ready copy was produced in 600 dpi postscript using dvips(k) v. 5.86 from Radical Eye Software. Gv v.3.5.8 (an interface for ghostscript) proved invaluable in correcting and checking the figures and the layout of the text. Our thanks are due to the many authors and contributors who have made their software freely available.

Finally, the first author would like to express his special thanks to his favorite n-dimensional image, the fourth author, for her patience, constant support and love. The second author would like to offer his deepest and most heartfelt thanks to his wife, Mercedes, for her love, understanding, support, encouragement and patience.

Part of the results in the different chapters have been previously published in [34, 36, 37, 38, 40, 41, 42, 47, 48, 49, 104, 105, 125, 126]. Part of the research was sponsored by the Spanish Board for Science and Technology (CICYT) and the Direcc�on General de Ense�anza Superior (DGES) undergrants TIC97-1150 and PB98-1374 respectively. The authors, September 2000


© SPIE. Terms of Use
Back to Top
PREMIUM CONTENT
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?
close_icon_gray