Proceedings Volume 2054

Computational Vision Based on Neurobiology

cover
Proceedings Volume 2054

Computational Vision Based on Neurobiology

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 17 March 1994
Contents: 1 Sessions, 19 Papers, 0 Presentations
Conference: Computational Vision Based on Neurobiology 1993
Volume Number: 2054

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Section
Section
icon_mobile_dropdown
Coding of inhibition in visual cortical spike streams
A. B. Bonds
Modulation of cortical firing rate is a major factor in defining cortical filter properties. Active response suppression (inhibition) is seen whenever cortical cells are exposed to grating stimuli that are non-optimal, in either the domain of orientation or spatial frequency. Responses are also reduced by pre-exposure to gratings of high contrast. The first phenomenon is termed spatially dependent inhibition, the second contrast gain control. We have explored the physiological basis for these two phenomena in striate cortical cells of the anesthetized cat. Sequences of spikes in responses show bursts characterized by interspike intervals of 8 msec or less. Both burst frequency and burst length depend on average firing rate, but at a given firing rate burst length is lower for non-optimal orientations. Burst length is also shortened by local injection of GABA. Burst length modulation is not seen in the case of contrast gain control. These results support the existence of two independent mechanisms for modulating cortical responsiveness. A GABA-ergic mechanism that shortens spike bursts is invoked by presentation of spatially non-optimal stimuli. Response normalization after presentation of high contrasts does not affect burst length and is not affected by GABA.
Visual cortex neurons in monkey and cat: contrast response nonlinearities and stimulus selectivity
Duane G. Albrecht, Wilson S. Geisler
The contrast response functions of cat and monkey visual cortex neurons reveal two important nonlinearities: expansive response exponents and contrast gain control. These two nonlinearities (when combined with a linear spatiotemporal receptive field) can have beneficial consequences on stimulus selectivity. Expansive response exponents enhance stimulus selectivity introduced by previous neural interactions, thereby relaxing the structural requirements for establishing highly selective neurons. Contrast gain control maintains stimulus selectivity, over a wide range of contrasts, in spite of the limited dynamic response range and the steep slopes of the contrast response function.
Monocular and binocular mechanisms of contrast gain control
Izumi Ohzawa, Ralph D. Freeman
Prolonged stimulation by temporally modulated sinusoidal gratings causes a decrease in the contrast sensitivity and response of neurons in the visual cortex. We have studied the dynamic aspects of this contrast gain control mechanism and how its temporal properties affect the determination of neural contrast response functions. In addition, we have considered the possibility that a single mechanism is sufficient to explain monocular and binocular properties of contrast gain control. We find that neural contrast response functions are highly susceptible to the measurement procedure itself, so that the data obtained in some studies seriously underestimate the slope of the function and overestimate the threshold. Therefore, careful selection of the experimental data is required for general use and for constructing models of visual cortical function. Comparisons of monocular and binocular properties of contrast gain control provide insights concerning the neural origin of the mechanism. Monocularly induced gain reductions are transferrable to the other eye, suggesting that gain control originates in part at a site following binocular convergence. However, binocular experiments conducted with interocular contrast mismatches indicate that the gain of the monocular pathways for each eye may be controlled independently. These results suggest that a single-gain control mechanism is not sufficient to account for the properties exhibited by cortical neurons.
New model of human luminance pattern vision mechanisms: analysis of the effects of pattern orientation, spatial phase and temporal frequency
John M. Foley, Geoffrey M. Boynton
Models of human pattern vision mechanisms are examined in light of new results in psychophysics and single-cell recording. Four experiments on simultaneous masking of Gabor patterns by sinewave gratings are described. In these experiments target contrast thresholds are measured as functions of masker contrast, orientation, spatial phase, and temporal frequency. The results are used to test the theory of simultaneous masking proposed by Legge and Foley that is based on mechanisms that sum excitation linearly over a receptive field and produce a response that is an s-shaped transform of this sum. The theory is shown to be inadequate. Recent single-cell-recording results from simple cells in the cat show that these cells receive a broadband divisive input as well as an input that is summed linearly over their receptive fields. A new theory of simultaneous masking based on mechanisms with similar properties is shown to describe the psychophysical results well. Target threshold vs masker contrast (TvC) functions for a set of target-masker pairs are used to estimate the parameters of the theory including the excitatory and inhibitory sensitivities of the mechanisms along the various pattern dimensions. The human luminance pattern vision mechanisms, unlike most of the cells, do not saturate at high contrast.
Dynamic object-based 3D scene analysis using multiple cues
A computational visual system (CVS) has been developed that segments objects in natural scenes using algorithms and filtering elements similar to those used by people. The filtering elements of the CVS are based on neural networks elucidated by physiological and anatomical studies. The algorithms of the CVS are based on data from psychophysical studies. This CVS classifies different types of patterns, based on object shape, texture, position in the visual field, and amount of motion parallax in subsequent scenes, without any a priori models. When analyzing 3D scenes, psychophysical and physiological evidence indicate that people construct an object-based perception, one that is event-driven. The object-based representation being modeled focuses on the object formation found in the dorsal cortical pathway, used to locate an object in 3D space. Therefore, the interaction between the eye-head movement system and the pattern recognition system is modeled. Global scene attributes, used to reveal objects masked by shadows and improve object segmentation, and local object attributes defined by the boundary of contrast differences between an object and its background are modeled. The importance of using paired odd and even symmetric detectors to form the boundary and analyze the texture of an object is emphasized. This information is used to construct a viewer- centered object-based map of the scene that is based on multiple object attributes. Algorithms that incorporate the relative weighting of the different object attributes being used to discriminate objects are used to instantiate computational networks that incorporate both competitive and cooperative networks.
Large-scale organization of the primate cortical visual system
Malcolm P. Young
The primate cortical visual system is composed of many structurally and functionally distinct areas or processing compartments, each of which receives on average about ten afferent inputs from other cortical areas and sends about the same number of output projections. The visual cortex is thus served by a very large number of cortico-cortical connections, so that the areas and their interconnections form a network of remarkable complexity. The gross organization of this cortical processing system hence represents a formidable topological problem: while the spatial position of the areas in the brain are becoming fairly well established, the gross `processing architecture,' defined by the connections, is much less well understood. I have applied optimization analysis to connectional data on the cortical visual system to address this topological problem. This approach gives qualitative and quantitative insight into the connectional topology of the primate cortical visual system and provides new evidence supporting suggestions that the system is divided into a dorsal `stream' and a ventral `stream' with limited cross-talk, that these two streams reconverge in the region of the principal sulcus (area 46) and in the superior temporal polysensory areas, that the system is hierarchically organized, and that the majority of the connections are from nearest-neighbor and next-door- but-one areas. The robustness of the results is shown by reanalyzing the connection data after various manipulations that simulate gross changes to the neuroanatomical database.
Topography of excitatory and inhibitory connectional anatomy in monkey visual cortex
Jennifer S. Lund, J. B. Levitt, Quanfeng Wu
It is chiefly within the superficial layers of 1 - 3 of the cerebral cortex that new properties are developed from relayed afferent information. The intrinsic circuitry of these layers is uniquely structured compared to the deeper layers; each pyramidal neuron connects laterally to other pyramids at a series of offset points spaced at regular intervals around it. As seen in tangential sections of layers 1 - 3, the pyramidal neuron axon terminal fields are roughly circular in cross section, forming a `polka dot' overall pattern of terminal distribution. In regions of peak density, the diameter of the circular fields matches the width of the uninnervated regions between the terminal fields. This dimension is also that of the average lateral spread of the dendrites of single pyramidal neurons making up the connections in each visual cortical area, a dimension which varies considerably between different cortical regions. Since every point across each cortical area shows similar laterally spreading patterns of connectivity, the overall array is believed to be a continuum of offset connectional lattices. It is also presumed that each pyramidal neuron, as well as projecting to separate points, receives convergent inputs from similar arrays of offset neurons. The geometry of local circuit inhibitory neurons matches elements of these lattices; basket neuron axons in these layers spread three times the diameter of the local pyramidal neuron dendritic fields while the basket neuron dendritic field matches that of the pyramidal cell. If both basket cell and pyramidal neuron at single points are coactivated by afferent relays, the basket axon might create a surround zone of inhibition preventing other pyramidal cells in the surrounding region being active simultaneously. As the pyramid develops its connections in this inhibitory field may fore each pyramidal neuron to send its axon out beyond the local inhibitory zone to find other pyramidal cells activated by the same stimulus. Since the basket neuron also contacts other basket neurons, by disinhibition through offset basket neurons, it will simultaneously encourage activity in pyramidal cells in a zone outside the limit of its axon field. This scaling of basket neuron axons is present in early postnatal cortex and it could lead to the punctate patterns of pyramidal neuron connectivity which also appear to develop postnatally. This anatomy might also produce the regular spacing of different functional attributes that is typical of visual cortical organization. Models that explore spatial geometries of excitation and inhibition resembling those described above are urgently needed to test current biological hypotheses underlying investigations of cerebral cortex.
Influence of figural interpretation on the selective integration of visual motion signals
Thomas D. Albright, Gene R. Stoner
The solution to the computational problem of reconstructing object motion from retinal image motion is underconstrained. In an effort to converge on a solution to this problem, the primate visual system appears to rely upon image cues that lead to an interpretation of the spatial relationships between objects in a visual scene. Psychophysical experiments illustrate this phenomenon through the apparent dependence of motion signal integration on luminance-based cues for occlusion and perceptual transparency. Neurophysiological studies of the cell populations thought to underlie motion signal integration reveal a change in directional selectivity that precisely parallels the perceptual phenomenon. Among obstacles faced in attempts to understand the neural bases of primate vision, the integration of motion signals holds a unique position; The computational problem is well-defined, a specific neural substrate has been identified, and the solution to the integration problem is absolutely critical for visually guided behavior. As such, it stands as a model system for exploring the relationships between neuronal phenomena, perception, and behavior.
Neuronal mechanism for signaling the direction of self-motion
Jean-Pierre Roy
Movement of an observer through the environment generates motion on the retina. This optic flow contains information about the direction of self-motion. To accurately signal the direction of self-motion however, the optic flow has to carry some depth information: there has to be differential motion of elements at different depths. One depth cue that is available to an organism with frontal eyes is binocular disparity. Cells in the dorsal subdivision of the Medial Superior Temporal area (area MSTd) have been proposed to play a role in the analysis of optic flow. We have examined the disparity sensitivity of neurons from MSTd in awake behaving monkeys in an attempt to understand the possible contribution of disparity to the computation of the direction of self-motion. Cells with a response to fronto-parallel motion were examined. While the monkey looked at a fixation spot on a screen in front of it, random dot stimuli moved in the preferred direction of the cell under study, and the disparity of the dots made the stimuli appear to move in a fronto-parellel plane in front of, on, or behind the screen. Over 90% of the neurons studied were sensitive to the disparity of the visual stimulus. Of those disparity sensitive cells, 95 % responded best either to near stimuli (stimuli with crossed disparities appearing to move in front of the screen) or to far stimuli (stimuli with uncrossed disparities appearing to move behind the screen). In 40% of the disparity sensitive cells, we found cells whose preferred disparity reversed as the direction of stimulus motion was reversed. For example, a cell that responded best to crossed disparities (foreground) for rightward motion, responded best to uncrossed disparities (background) for leftward motion. Such an opposite motion of foreground and background occurs when an organism tracks a stationary object while translating in a direction different from the line of gaze. We propose that the reversal of disparity selectivity with a reversal in direction selectivity indicates one way in which these neurons could signal the direction of self-motion of the organism in its environment.
Neural processing of biological motion in the macaque temporal cortex
Michael W. Oram, David I. Perrett
Cells have been found in the superior temporal polysensory area (STPa) of the macaque temporal cortex which are selectively responsive to the sight of particular whole body movements (e.g., walking) under normal lighting. These cells typically discriminate the direction of walking and the view of the body (e.g., left profile walking left). We investigated the extent to which these cells are responsive under `biological motion' conditions where the form of the body is defined only by the movement of light patches attached to the points of limb articulation. One third of the cells (25/72) selective for the form and motion of waling bodies, showed sensitivity to the moving light displays. Seven of these cells showed only partial sensitivity to form from motion, in so far as the cells responded more to moving light displays than to moving controls but failed to discriminate body view. These seven cells exhibited directional selectivity. Eighteen cells showed statistical discrimination for both direction of movement and body view under biological motion conditions. Most of these cells showed reduced responses to the impoverished moving light stimuli compared to full light conditions. The 18 cells were thus sensitive to detailed form information (body view) from the pattern of articulating motion. Cellular processing of the global pattern of articulation was indicated by the observations that none of the cells were found sensitive to movement of individual limbs and that jumbling the pattern of moving limbs reduced response magnitude. The cell responses thus provide direct evidence for neural mechanisms computing form from non-rigid motion. The selectivity of the cells was for body view, specific direction and specific type of body motion presented by moving light displays and is not predicted by many current computational approaches to the extraction of form from motion.
Neurobiological mechanisms of cortical direction selectivity
Curtis L. Baker Jr., Jane C. Boulton
From consideration of a number of types of apparently linear and nonlinear behavior of direction selectivity of visual cortex neurons, it will be argued that there are at least two fundamentally different types of motion computation. The first, designated quasi-linear, entails a summation of afferent signals which are in approximate quadrature phase, both spatially and temporally (e.g., lagged and nonlagged LGN afferents, in the cat); the summation may be of a linear or a partially nonlinear nature, but is carried out on specific signals falling within a relatively restricted spatial frequency passband and confined receptive field. The second, referred to as nonlinear, involves a highly nonlinear integration of additional, nonspecific afferent signals, generally outside the conventional spatiotemporal frequency passband of a neuron, and also outside of the `classical' receptive field. Some novel aspects of this formulation are: the same neuron may exhibit both quasi-linear and nonlinear behavior; quasi- linear mechanisms may display substantial nonlinearities, possibly accounting for detection of some non-Fourier stimuli. Data are presented to illustrate the idea that white noise analysis methods are well-suited to characterize the spatiotemporal nonlinearities of quasi-linear mechanisms, but fail to provide insight into the processing of nonlinear mechanisms.
Psychophysical evidence for both a 'quasi-linear' and a 'nonlinear' mechanism for the detection of motion
Jane C. Boulton, Curtis L. Baker Jr.
A random Gabor Kinematogram stimulus provides the opportunity to demonstrate Fourier and non-Fourier motion perception, and discontinuities of performance from one to the other, in a way that supports the existence of categorically distinct underlying mechanisms. Two-frame apparent motion was used with a stimulus comprised of micropatterns randomly distributed across the visual field. The micropatterns were Gabor functions that contain a narrow band of spatial frequencies and orientations while maintaining a local nature in space. Psychophysical techniques were used to assess the detection of motion of this stimulus; two underlying processes were identified and characterized. For short temporal intervals and spatially dense stimuli, the response of the visual system can be predicted from the direction information in the spatio-temporal Fourier power spectrum of the stimulus: a quasi-linear mechanism. For longer temporal intervals and spatially sparse stimuli, detection of motion is NOT predictable from the information in the spatio-temporal Fourier power spectrum. Performance is independent of the spatial frequency content and orientation of the micro-patterns, but is limited by the `density' of stimulus elements along the axis of motion: a nonlinear mechanism. It is proposed that the nonlinear mechanism is mediated by the parvocellular retina-cortical pathway, and the quasi-linear by the magnocellular pathway.
Motion mechanisms based on nonlinear spatial filters have lower temporal resolution
Andrew M. Derrington
Analysis of the motion of spatial patterns may be accomplished by analyzing the spatiotemporal variations caused when a spatially varying luminance waveform moves over the detector surface. Nonlinear transformations (such as squaring) of the input signal may give rise to signal (a `distortion product') that varies on a different spatial scale from that of the original, and can thus give rise to a motion signal that is processed by a different set of spatiotemporal filters. Experiments with patterns made by adding together two sinusoidal gratings, differing in spatial frequency or orientation and in temporal frequency, show that the human visual system can analyze the motion of the `difference-frequency' distortion products that would be introduced by squaring, and thus must contain mechanisms that use some nonlinear transformation of this sort. This raises a question: is the nonlinearity simply an inherent part of the transduction process, or do separate linear and nonlinear motion analyzers exist? We find that performance in motion discrimination tasks that require nonlinear analyzers declines rapidly for stimulus durations less than about 200 msecs and for temporal frequencies greater than about 1 Hz, whereas discriminations based on linear analyses are reliable and correct at durations down to 20 msecs and at temporal frequencies over 10 Hz. This suggests that the linear and nonlinear motion analyzers are different.
Fidelity metrics and the test-pedestal approach to spatial vision
This paper has three parts. Part 1 contains musings on the title of this conference, 'Computational Vision Based on Neurobiology.' Progress has been slow in computational vision because very difficult problems are being tackled before the simpler problems have been solved. Part 2 is about one of these simpler problems in computational vision that is largely neglected by computational vision researchers: the development of a fidelity metric. This is an enterprise perfectly suited for computational vision with the side benefit of having spectacular practical implications. Part 3 discusses the research my colleagues and I have been pursuing for the past several years on the Test-Pedestal approach to spatial vision. This approach can be helpful as a guide for the development of a fidelity metric. A number of experiments using this approach are discussed. These examples demonstrate both the power and the pitfalls of the Test-Pedestal approach.
Computational reconstruction of the mechanisms of human stereopsis
The properties of human stereoscopic mechanisms may be derived from dichoptic interaction and masking effects on stereoscopic detection thresholds in any relevant stimulus domain (spatial frequency, temporal frequency, disparity, orientation, etc.). The present study focuses on the spatial properties of mechanisms underlying stereoscopic depth detection. The computational approach is based on the full exploration of plausible model structures to characterize their idiosyncrasies, which often allows exclusion of proposed mechanisms by comparison with data obtained under conditions in which the idiosyncrasies should be expressed. For example, we conducted a detailed analysis of threshold elevation functions (TEFs) under plausible channel shapes, combination rules and masking behavior derived from previous studies. The analysis reveals that TEFs may be much narrower than and differ in shape from the underlying mechanisms. For example, only two discrete channels are required to produce TEFs peaking close to each fixed test frequency, with no relation to channel peaks. We apply this analysis to the stereospatial masking functions collected by Yang and Blake (1991) to determine the likely channel structure underlying the empirical masking performance. The analysis generally supports the two-mechanism model that they propose but shows that the assumptions underlying their estimates of the unmasked sensitivity function are incorrect. The analysis excludes stereospatial channels tuned below 2.5 c/deg, a region in which Schor, Wood, and Ogawa (1984) obtained evidence for many narrowly tuned channels by measuring disparity thresholds for targets with different peak tunings in the two eyes. Our computational model for the latter data is consistent with the lowest tuned channel being at 2.5 c/deg, this channel being narrowly tuned to dichoptic contrast differences, as described by Legge and Gu (1989) and Halpern and Blake (1988). Thus, all such stereo tuning data can be explained in a model in which all stereoscopic channels are tuned above 2.5 c/deg.
Reconstruction of 3D structure from two central projections by a fully linear algorithm
Maxim L. Kontsevich, Leonid L. Kontsevich
A completely linear method for reconstruction of 3D structure from two central projections is proposed. We show that this method for combining such parameters as speed, robustness and generality performs better than all other algorithms.
Disparity tuning of cyclopean visual mechanisms
Scott B. Stevenson, Clifton M. Schor, Lawrence K. Cormack
Tuned mechanisms or `channels' have been demonstrated in many aspects of human vision, and their characteristics span a continuum from a small set of broadly tuned channels (as in the spectral tuning of cone mechanisms) to a large array of narrow channels (as in the spatial tuning of cone mechanisms). The optimal number and tuning widths of channels for a given dimension depends on a trade-off between an economy of processor resources and the avoidance of metamerism. A small number of broad channels requires a small investment in processor resources and can support fine discriminations but is subject to metameric confusions. A large number of narrow channels requires a greater investment in processor resources but allows for the representation of multiple values on the tuning dimension (e.g., transparency). In the context of stereopsis and vergence control, single unit recordings have provided evidence that disparity tuned mechanisms cover the range from closely spaced, narrow channels (`tuned' cells) to widely spaced, broad channels (`near/far' cells). In principal, near/far mechanisms should be sufficient to control vergence and allow for fine stereoacuity right around the horopter. Tuned mechanisms might be required for fine disparity discriminations off the horopter and for the perception of stereo transparency. We have investigated the disparity tuning characteristics of binocular visual mechanisms which mediate (1) the psychophysical detection of surfaces in dynamic noise stimuli and (2) the involuntary oculomotor vergence responses to such surfaces. We have found evidence that both perceptual and oculomotor systems involve a large set of narrowly tuned mechanisms with inhibition between neighboring channels. A model is developed which clarifies the nonobvious relationship between measured tuning functions and characteristics of underlying channels.
Spatial vision based upon color differences
To understand the role of color in spatial vision, it is necessary to examine both the extent to which spatial discriminations can be based solely upon color differences and the interaction between color and luminance variations when they are simultaneously present. The well- known differences in the spatial and temporal contrast sensitivity functions for color and luminance and the apparently impoverished input from the color mechanisms to certain higher functions obscure the fact that spatial discriminations based solely upon color differences are quite good. For example, spatial frequency discriminations between high-contrast patterns at isoluminance are only slightly poorer than for comparable luminance patterns, averaging about 5% to 6% of the base frequency. Similarly, orientation differences of about 1 deg between isoluminant patterns can be reliably discriminated at high contrasts, even for stimuli that lie along a tritanopic confusion axis. Similar comparisons from several tasks are reviewed, as are tasks involving color-luminance interactions. These provide information about the target behavior that must ultimately be explained if the physiological basis of color vision is to be understood.
Standard model of color vision: problems and an alternative
Russell L. De Valois
The standard model of early color processing postulates an achromatic magno LGN non- opponent pathway summing the outputs of the L and M cones; a red-green parvo LGN opponent cell system differencing L and M cones; and a yellow-blue or tritan parvo LGN opponent cell system differencing S from the (L + M) cones. A number of psychophysical and perceptual findings, however, do not agree with this standard model, and we have suggested an alternative. Our model diverges from that above in three fundamental ways: (1) L-M and M-L cells do not constitute the `red-green' system, but serve as the principal inputs to both the red-green and yellow-blue systems; (2) S-LM and LM-S opponent cells do not constitute the yellow-blue system, but rather combine at a third stage with the LM opponent cells in different ways to produce both the red-green and the yellow-blue systems, serving a modulatory role to break the one effective LGN response axis into separate red-green and yellow-blue perceptual color axes at some cortical site; (3) in addition to chromatic information, the parvo opponent cells (as well as the magno cells) carry intensity information, the chromatic and intensity components being separated at the third stage.