Computational Visual Attention Systems
Published on Feb 21, 2020
Computational Visual attention systems (CVAS ) have gained a lot of interest during the last years. Similar to the human visual system, VSAS detect regions of interest in images: by “directing attention” to these regions, they restrict further processing to sub-regions of the image.
Such guiding mechanisms are urgently needed, since the amount of information available in an image is so large that even the most performant computer cannot carry out exhaustive search on the data. Psychologists, neurobiologists, and computer scientists have investigated visual attention thoroughly during the last decades and profited considerably from each other. However, the interdisciplinary of the topic holds not only benefits but also difficulties.
This seminar provides an extensive survey of the grounding psychological and biological research on visual attention as well as the current state of the art of computational systems. It includes basic theories and models like Feature Integration Theory(FIT model) and Guided Search Model(GSM).A Real time Computational Visual Attention System VOCUS (Visual Object detection with a CompUtational attention System) is also included. Furthermore, presents a broad range of applications of computational attention systems in fields like computer vision, cognitive systems, and mobile robotics
Introduction of Computational Visual Attention Systems
Perhaps the most prominent outcome of neurophysiological findings on visual attention is that there is no single brain area guiding the attention, but neural correlates of visual selection appear to be reflected in nearly all brain areas associated with visual processing. Additionally, new findings indicate that many brain areas share the processing of information from different senses and there is growing evidence that large parts of the cortex are multi sensory. Attentional mechanisms are carried out by a network of anatomical areas. Important areas of this network are the Posterior Parietal cortex (PP), the Superior Colliculus (SC), the lateral intraparietal area (LIP), the frontal eye field (FEF), and the pulvinar. There are three major functions concerning attention: orienting of attention, target detection, and alertness.
The first function, the orienting of attention to a salient stimulus, is carried out by the interaction of three areas: the PP, the SC, and the pulvinar. The PP is responsible for disengaging the focus of attention from its present location (inhibition of return), the SC shifts the attention to a new location, and the pulvinar is specialized in reading out the data from the indexed location. This combination of systems is called as the posterior attention system.
The second attentional function, the detection of a target, is carried out by the anterior attention system. They claim that the anterior cingulate gyrus in the frontal part of the brain is involved in this task. Finally, the alertness to high-priority signals is dependent on activity in the norepinephrine system (NE) arising in the locus coeruleus. Brain areas involved in guiding eye movements are the FEF and the SC. There has been evidence that the source of top-down biasing signals may derive from a network of areas in parietal and frontal cortex.
At present, it is known that there is not a single brain area that controls attention but a network of areas. Several areas have been verified to be involved in attentional processes, but the accurate task and behavior of each area as well as the interplay among them still remain open questions
Feature Integration Theory:
The Feature Integration Theory (FIT) of Treisman has been one of the most influential theories in the field of visual attention. The theory was first introduced in 1980, but it was steadily modified and adapted to current research findings.
The theory claims that “different features are registered early, automatically and in parallel across the visual field, while objects are identified separately and only at a later stage, which requires focused attention” .Information from the resulting feature maps— topographical maps that highlight conspicuities according to the respective feature—is collected in a master map of location. This map specifies where in the display things are, but not what they are. Scanning serially through this map focuses the attention on the selected scene regions and provides this data for higher perception tasks. Treisman mentioned that the search for a target is easier the more features differentiate the target from the distracters.
If the target has no unique features but differs from the distracters only in how its features are combined, the search is more difficult and often requires focused attention (conjunctive search). This usually results in longer search times. However, if the features of the target are known in advance, conjunction search can sometimes be accomplished rapidly. She proposed that this is done by inhibiting the feature maps, which code nontarget features. Additionally, Treisman introduced so called object files as “temporary episodic representations of objects.” An object file “collects the sensory information that has so far been received about the object. This information can be matched to stored descriptions to identify or classify the object”
GSM was developed by Jeremy M Wolfe. The basic goal of the model is to explain and predict the results of visual search experiments. There has also been a computer simulation of the model. Wolfe has denoted successive versions of his model as Guided Search 1.0, Guided Search 2.0, Guided Search 3.0 and Guided Search 4.0. It shares many concepts with the FIT but is more detailed in several aspects that are necessary for computer implementations. An interesting point is that in addition to bottom-up saliency, the model also considers the influence of top-down information by selecting the feature type, which distinguishes the target best from its distracters.