Selective dorsal and
ventral processing:
Evidence for a common
attentional mechanism in reaching and perception
Heiner
Deubel, Werner X. Schneider and Ingo Paprotta
Institut
für Psychologie
Allgemeine
und Experimentelle Psychologie
Ludwig-Maximilians-Universität
München
Germany
Running
head: Reaching and attention
Correspondence to: Heiner Deubel,
Institut für Psychologie, Ludwig-Maximilians-Universität, Leopoldstrasse 13,
D-80802 München, Germany. Fax: +49-89-2180-5211. E-mail:
Deubel@mip.paed.uni-muenchen.de.
Acknowledgement: This reseach was
supported by the Deutsche Forschungsgemeinschaft, SFB 462
("Sensomotorik").
Abstract
We
recently demonstrated that visual attention before saccadic eye movements is
focussed on the saccade target, allowing for spatially selective object
recognition (Deubel & Schneider, 1996). Here we investigate the role of
visual selective attention in the preparation of manual reaching movements. A
dual-task paradigm required the preparation of a reaching movement to a cued
item in a letter string. Simultaneously, the ability to discriminate between
the symbols "E" and "mirror-E" presented tachistoscopically
within the surrounding distractors was taken as a measure of perceptual
performance. The data demonstrate that discrimination performance is superior
when the discrimination stimulus is also the target for manual aiming; when
discrimination stimulus and pointing target refer to different objects,
performance deteriorates. So, it is not possible to maintain attention on a
stimulus for the purpose of discrimination while directing a movement to a
spatially separate object. The results argue for an obligatory coupling of
selection-for-perception and selection-for-action. The findings are discussed
in relation to dorsal and ventral visual processing streams.
Introduction
Our knowledge about the architecture
of the visual system of primates has increased enormously during the last two
decades. There is growing consensus that visual processing occurs in parallel
and interacting streams at different, quasi-hierarchical levels (see, e.g.,
DeYoe & VanEssen, 1988; Hubel & Livingstone, 1988; Zeki, 1993; Milner
& Goodale, 1995). Several suggestions have been made how this parallel and
distributed processing of visual information might be functionally organized. One
suggestion by Mishkin, Ungerleider, & Macko (1983) - based on lesion work
in monkeys - claims that the visual system consists of two main pathways, namely
the dorsal "where"-pathway, and the ventral "what"-pathway.
The suggested function of the ?what@-pathway is to recognize objects based on their visual
appearance. The ?where@-pathway, on the other hand, computes spatial
information about objects. At the cortical level, the segregation of both
pathways can be tracked back to the primary visual cortex, area V1. From there,
the "where"-pathway runs dorsally into the posterior parietal lobe
while the "what"-pathway leads ventrally to the inferior temporal lobe.
Since this proposal, a large body of research supported this distinction of two
main pathways (see, however, Zeki, 1993). For instance, patients with brain
lesions restricted to the inferior temporal cortex have problems to recognize
objects by sight, a symptom called visual agnosia (see, e.g., Farah, 1990; Kolb
& Whishaw, 1990). At the same time, spatial abilities, such as pointing to
an object, are left intact. When agnosia is purely visual, recognition by other
senses, such as touch, is still intact. Lesions restricted to the superior
parietal areas of the dorsal "where"-pathway, on the other hand, can
cause a symptom called optic ataxia (see, e.g., Milner & Goodale, 1995). These
patients are able to identify objects due to their visual appearance, but they
exhibit misreaching (mislocalization) towards the same objects.
The labeling of ventral and dorsal
pathway as a "what"- and a "where"-pathway was recently
criticized by Goodale and Milner (1992; Milner & Goodale, 1995). These
authors still agree with ascribing the computation of "what"-aspects,
that is, the identification of objects, to the ventral pathway. They disagree,
however, about the function of the dorsal pathway. Not perception of the
spatial layout of the external world is its main task but instead computation
of spatial information for motor actions such as a saccade or a reach towards
an object. In other words, Goodale & Milner (1992) suggest a shift in
emphasis from spatial perception to spatial information for action. Their view
of dorsal processing is supported by human neuropsychological studies and
neurophysiological work in macaques, especially by single cell recordings (see
Milner & Goodale, 1995). The reviewed data indicate that the idea of a
single representation of external space is probably wrong, and that instead
several spatial-motor representations - sometimes also called processing
streams - exist in parallel for different kinds of motor actions (see, e.g.,
Stein, 1992; Graziano & Gross, 1994; Rizzolatti, Riggio, & Sheliga, 1994;
Milner & Goodale, 1995). For instance, information about saccade landing
points is probably computed and coded in the lateral intraparietal area (LIP),
while endpoints for grasping movements are computed in area 7b - both are part
of the parietal lobe. So, the brain seems to code spatial information for
different effectors, that is, for different action classes, in different parts
of the brain. In summary, Milner & Goodale (1995) suggest that the ventral
stream is involved in visual perception and identification, while the dorsal
stream computes information for spatial-motor actions. A related distinction
was recently suggested by Jeannerod (1994) who differentiated a "semantic
mode" of processing, located in the temporal lobe (ventral stream) and a
"pragmatic mode", located in the parietal cortex (dorsal stream).
Visual processing in both streams
does not occur in a purely automatic, "bottom-up" driven manner. Rather,
control of processing is task-dependent - this type of selectivity of visual
processing has been often called endogenous visual attention (e.g. Posner,
1980). A tremendous amount of research in experimental psychology and the
neurosciences has investigated the properties of these selection processes in
vision (see, Treisman, 1988; Bundesen, 1990; Posner & Petersen, 1990; van
der Heijden, 1992; Schneider, 1993; Duncan & Desimone, 1995; for
overviews). Traditional experimental psychology focussed on the function of
visual attention in the ventral stream, that is, on
"selection-for-visual-perception". For instance, experiments on
visual search (see, Treisman & Gormican, 1988; Wolfe, 1994; for overviews)
attempted to determine how fast and how accurate certain visual attributes and
their conjunctions can be "perceived" and signalled. In most of these
investigations, "ventral" attributes such as color, orientation,
etc., served as the properties that defined the search target. Therefore,
selection-for-visual-perception (in contrast to selection
-for-spatial-motor-control - the dorsal processing domain) has been the main
topic of search tasks. Another research line where the effects of visual
attention were mainly investigated for the ventral processing refers to the
spatial precueing paradigm (e.g. Eriksen & Hoffman, 1973; Posner, 1980; van
der Heijden, 1992). The experiments show that preknowledge about the possible
location of a target leads to faster and more accurate responses to visual
aspects such as alphanumeric identity or simple shape features such as curved
vs. straight (see, van der Heidjen, 1992; Posner & Raichle, 1994; for
overviews).
This bias to measure the effect of
visual attention mainly for ventral visual processing can be traced back to the
suggested functions of attention. Attention is assumed to "facilitate
detection" (e.g. Posner, 1980), to allow "feature integration"
(Treisman & Gelade, 1980), "object recognition" (e.g. LaBerge
& Brown, 1989; Schneider, 1995), and "entry to visual short-term
memory" (Duncan & Humphreys, 1989; Bundesen, 1990). These assumptions
do not imply that the selection mechanism itself is located only the ventral
stream. Instead, several theories suggested a central role of the dorsal stream
in controlling the attentional mechanism, sometimes called spatial attention
mechanism (e.g. LaBerge & Brown, 1989; Posner & Petersen, 1990; van der
Heijden, 1992; Schneider, 1995).
As compared to the large body of
theoretical work on the relation of attention and (ventral) perceptual
processing, there exist only rather few suggestions about the role of visual
attention in dorsal processing, more precisely, about the role of attention in
spatial-motor control. Allport (1987) and Neumann (1987) suggested that
spatial-motor actions such as grasping an object among other objects also imply
a selection process, which is what Allport (1097) called
"selection-for-action". Natural environments usually contain several
objects, and only of them should be used as the target for an individual
action. For instance, grasping a pen among other pens requires the motor system
to receive spatial information (probably in arm-centered coordinates, see,
e.g., Grazianno & Gross, 1994) of the intended pen only. Information from
other pens has to be excluded from controlling the grasping action. In other
words, an attentional mechanism is needed that selects the spatial information
of the movement target. Because spatial information is provided by the visual
system (the dorsal pathway), Allport (1987, 1989) and Neumann (1987, 1990) have
suggested that visual attention is involved in this selection process. Another
example of selection-for-spatial-motor-action refers to the control of saccadic
eye movements. Before each saccade, the next fixation point has to be selected
among many potential candidates in the environment.
Unfortunately, there is not much
experimental work on selection-for-spatial-motor-action. Tipper, Lortie &
Baylis (1992) investigated the role of visual attention for manual reaching in
an interference paradigm. The question was whether interference effects found
for ventral visual processing (e.g. Eriksen & Eriksen, 1974) can also be
obtained for spatial-motor actions. The degree of interference is usually
considered as a measure of the efficiency of attentional processes. In these
experiments, subjects had to reach, as fast and as precisely as possible, from
a starting position to one of nine locations indicated by a red light (the
target). In some trials, a yellow light (the distractor) appeared -
simultaneously with the red target light - at a different location. Substantial
interference effects were obtained; response latencies were prolonged compared
to trials where no distractor appeared. This interference effect was only
observed when the distractor was located between the starting position and the
target. The results show that interference effects of nearby objects can also
be obtained for spatial-motor action such as reaching, suggesting that visual
attention processes are also involved in selection-for-spatial-motor-action. A
similar conclusion was reached by Castiello (1996). In one of his experiments,
subjects had to perform a grasp to a target as a primary task. A secondary,
non-spatial task was required for a different object located close to the
target. The author observes interference effects of the secondary task on the
kinematics of the primary grasping movement.
Another research line that deals
with dorsal selection refers to the relation of eye movement control and visual
attention. The leading question has been whether visual attention for
perceptual processing on the one hand, and the selection of a target for a
saccade on the other are independent or not. The results of early experiments
on this issue were controversial (e.g. Klein, 1980; Posner, 1980; Remington,
1980), partly due to methodological problems (see, Shepard et al., 1986). More
recent work (Hoffman & Subramaniam, 1995; Kowler et al, 1995; Deubel &
Schneider, 1995; Schneider & Deubel, 1995) clearly demonstrated a strict
link between ventral selection-for-perception and dorsal selection-for-a-saccade.
In the experiments of Deubel &
Schneider (1996), subjects had to saccade to locations within horizontal letter
strings left or right from a central fixation cross. The performance in
discriminating between the symbols " " and " " presented
tachistoscopically before the saccade within the surrounding distractors was
taken as a measure of visual attention. The data showed that discrimination
performance is best when discrimination target and saccade target referred to
the same object. This result holded no matter whether the saccade was directed
by a peripheral cue or by a central cue. The findings strongly argue for an
obligatory and selective coupling of saccade programming and visual attention;
this coupling between dorsal and ventral processing is restricted to one common
target object at a time.
Based on these data and other
computational considerations, Schneider (1995) postulated a Visual Attention
Model (VAM) that suggests a common selection mechanism for both processing
streams. In line with two-stage models of perception and attention (Neisser,
1967), a first stage of low-level visual processing computes, in early visual
areas of the brain (e.g. V1, V2), elementary visual information in form of
"primitive" object structures (visual units). Higher-level visual
processing in the dorsal and ventral stream occurs only for one visual unit
(one "object"). In the model, visual attention is the mechanism that
determines the unit, carries out the selection, and gates the information flow
from low- to high-level vision in a way that only information from one object
is further processed. VAM claims that visual attention selects one low-level
visual object at a time, leading to priorized perceptual processing in the
ventral stream (e.g., the object is recognized). Simultaneously, possible
spatial-motor actions (saccade, pointing, reaching, grasping etc.) towards this
object are programmed in the dorsal stream. Only the (effector-specific)
"Go"-signal is necessary to convert the programs into overt action.
As described before, such
attention-mediated and object-specific coupling of dorsal and ventral
processing has already been demonstrated for eye movement control and
perceptual selection (Deubel & Schneider, 1996). More than just for
saccades, however, VAM predicts that the same coupling should also hold for
aiming, reaching, and grasping (Schneider, 1995; p. 363). In the present study
we analyzed the coupling of reaching movements and visual discrimination. For
this purpose, a dual-task paradigm similar to that used in our previous studies
was developed. The primary task was to make a goal-directed reaching movement
to a cued object, measuring selection-for-spatial-motor-action in the dorsal
stream. Prior to the movement, a secondary task required to discriminate
between the characters "E" and mirror-"E", measuring
selection-for-perception ("traditional" visual attention) in the
ventral stream. It is hypothesized that the programming of the reaching
movement yokes the visual attention mechanism, so that during this selection
process no other object can be processed in high-level ventral vision. Consequently,
discrimination performance should be best when discrimination target and
reaching target refer to the same object. For non-corresponding reaching and
discrimination targets, better than chance performance is only possible when
visual attention shifts first to the discrimination target and then to the
reaching target. In this case, longer initiation latencies for the movement
should be expected.
Methods
Subjects
Initially, 7 subjects participated
in the experiments. Two of them were excluded from further analysis since they
were not able, even after some training, to produce mean hand movement
latencies shorter than 700 msec. The age of the five remaining subjects ranged
from 22 to 28 years. They had normal vision and normal motor behavior. All
subjects had experience in a variety of experiments in oculomotor research. One
subject was one of the authors of the study, the others were naive with respect
to the aim of the experiments.
Experimental
set-up
Figure 1 shows a sketch of the
experimental situation. The subject was seated in a dimly lit room. The visual
stimuli were presented on a fast 21 inch color monitor (CONRAC 7550 C21),
visible through a one-way mirror. The monitor provided a frame frequency of 100
Hz at a spatial resolution of 64 pixels per inch. Active screen size was 40 x
30 cm; the viewing distance was 57.7 cm. The video signals were generated by a
freely programmable graphics board (Kontron KONTRAST 8000), controlled by a PC
via the TIGA (Texas Instruments Graphics Adapter) interface. The stimuli
appeared on a grey background which was adjusted to a mean luminance of 2.2
cd/m2. The luminance of the stimuli was 23 cd/m2. The
relatively high background brightness is essential in order to avoid the
effects of phosphor persistence (Wolf and Deubel, 1996).
The application of a one-way mirror
allowed free hand movements to the stimuli without visual feedback about the
hand position. Reaching movements were recorded with a Fastrak electro-magnetic
position and orientation measuring system (Polhemus Inc. 1993) and sampled at
400 Hz. The sender device was fixed at 60 cm in front of the subject. The
sender emits time-multiplexed, orthogonal electro-magnetic fields of 10 kHz
frequency. From induction in the receiver which was mounted on the fingertip of
the subject's right hand, the orientation relative to the sender device is
calculated by a central processing unit. From the intensity of the
electro-magnetic field, the distance between sender and receiver is determined.
The position in space is calculated from distance and orientation by use of a
specific digital signal processor (TI320C30). The device allows for a maximum
translation range of 10 feet, with an accuracy of 0.03 inches RMS. The frequency
response is 120 Hz; without further filtering the phase lag response is only 4
msec. Connected on the receiver was a red LED (5 mm diameter), controlled by
the PC. The LED allowed to provide controlled visual feedback about the spatial
position of the fingertip.
Eye fixation was controlled by an
infrared eyetracker (IRIS, Skalar Medical) with a temporal bandwidth of 240 Hz.
This device measures the reflection difference between sclera and iris by
infrared LEDs and phototransistors that are applicated next to the subject's
eyes. Head movements were restricted by an adjustable chin rest. The
experiments were completely under the control of a 486 Personal Computer. The
PC also served for the automatic off-line analysis of the pointing movement
data in which movement latencies and start and end positions of the manual
responses were determined.
Calibration
and data analysis
Each session started with a
calibration procedure of the eyetracker in which the subject had to
sequentially fixate 3 positions arranged on a horizontal line at distances of
8.5 deg. Further, the origin and coordinate alignment frame of the position
sensor were set relative to the projected position of the monitor center. The
position sensor behaved linearly within 30 cm around the center position. The
overall accuracy was better than 2 mm. In order to determine latency,
amplitude, and duration of the reaching movements, an off-line program for
evaluation of movement trajectory parameters searched the movement record for
the transgression and subgression of a vectorial velocity threshold of 10 mm/s
(which is equivalent to about 1 deg/sec). The beginning and the end of the
reaching movement was calculated as linear regressions in a 200 msec time
window around these points. With respect to a possible drift movement after the
reaching the end position had to stay within a 2 mm interval for 62.5 msec
after the initial movement.
Experimental
paradigm
After an initial training block that
was not included in the data analysis each subject underwent six blocks (3
blocks per day) of each of the experiments; each block consisted of 120 single
trials. The subject performed a dual task. In each experimental trial, the
reaching movement was guided by a central, symbolic cue that indicated the
movement target (MT) within a string of letters. Moreover, the subject had to
report the identity of a discrimination target (DT) presented
tachistoscopically in the string. Two different experiments were performed. In Experiment
1, DT appeared before the hand movement. For each experimental
block, the position of DT was constant, either on the right or on the left, and
on the central position of the string. Experiment 2 was similar to
Experiment 1 except that here, DT was presented only at the onset of the
reaching movement.
Figure 2 shows an example for the
sequence of stimuli in a single trial of Experiment 1. Each trial
started with the presentation of a small fixation cross in the center of the
screen, with a size of 0.25 deg. Simultaneously, two strings of premask
characters appeared left and right of the central fixation, each consisting of
five pre-mask items resembling the number "8". The width of each item
was 0.9 deg of visual angle, its height was 1.4 deg. The distance between the
items was 2.4 deg, with the central item of the five letters being presented at
an eccentricity of 7.65 deg. The three central items of each letter string
appeared on ellipses of red (r), green (g) and blue (b) color, as indicated in
the figure. Color intensities of the ellipses were adjusted by
flicker-photometry to appear about equally salient.
The subject was asked to keep strict
fixation during the whole trial at the center of the screen, initially
indicated by a central fixation cross. Maintenance of fixation was controlled
by the IRIS oculometer. At the beginning of the trial, the subject had to
position his/her fingertip on the location of the central cross. The position
of the fingertip is indicated by the arrowhead in figure 2. In this phase, the
LED was switched on, aiding the precise positioning. After a delay of 1000 to
1600 msec, a symbolic cue in the form of a red, green or blue triangle appeared
in the center of the screen, pointing either to the right or to the left side. Color
and pointing direction of the triangle thus unequivocally indicated a specific
item, the movement target (MT), within the string. The primary task was to
"point to this target item as fast and precisely as possible". Simultaneously
with cue onset, the LED was switched off to disable any further visual feedback
of hand or pointing position. 150 msec after cue appearance, well before the
onset of the pointing movement, the premask characters changed into nine
distractors and one discrimination target. The distractors were randomly
selected among the characters " " and " ". The central
character on one of both sides was replaced by the discrimination target (DT)
which consisted either in the letter " " or a mirror-symmetrical
version of this letter (" "). The position of the DT was constant
during each block and known to the subject (e.g., central position in the
string on the right side). The movement target positions however were varied
independently within the central three items of the strings, resulting in 12
combinations of movement target / discrimination target positions. All
experimental conditions occurred with equal probability. Target and distractors
remained visible for 150 msec. Then, the items and the central cue were removed
and only the colored ellipses remained.
Due to the timing of the stimulus
presentation, the discrimination target was no longer present 300 msec after
the appearance of the colored triangle. As a result of this stimulus timing
most reaching movements were initiated well after the disappearance of
target and distractors (see figure 5). In order to eliminate occasional
responses that occurred too early, the off-line data analysis discarded
movements with latencies shorter than 200 msec. Also, trials with movement
velocities smaller than 11 mm/s2 and durations shorter than 50 msec
and longer than 600 msec were not considered in the analysis. These incidences
occurred in less than 2% of all trials.
1 sec after the onset of the
reaching movement the LED was switched on again in order to enable a visual
feedback control of the reached finger position. Finally, the subject
indicated, without time pressure, the identity of the discrimination target
(" " or " ") by pressing one of two buttons (2AFC task). The
central fixation cross reappeared after the subject's decision and the next
trial was initiated by the computer.
In separate sessions, two types of
"single-task" controls were run. A first control task ("No
discrimination - reaching only" single task condition) served to discern
pointing reaction times in a single task situation. For this purpose, the
subject was asked to point to the indicated position, but was not required to
discriminate. A second control task ("No reaching - discrimination
only" single task condition) served to test the discrimination performance
without pointing reaction. Here, the subject was only asked to indicate the
identity of the discrimination target, but no reaching reaction was required. Each
subject performed two blocks of each control task.
Experiment 2 was very similar
to Experiment 1 except that here, the presentation of the discrimination
stimulus occurred only at the onset of the reaching movement. For this
purpose, the computer performed an on-line calculation of movement velocity. Stimulus
presentation was triggered when the velocity exceeded a threshold of 1 deg/sec.
Results
Experiment
1: Movement performance
After the training in the initial
training block, all five subjects were able to produce reaching movements with
surprisingly consistent accuracy and latency. Figure 3 gives examples of
several manual responses from one of the subjects. The graph displays the
registered finger position as a function of time, for the different movement
target eccentricities. It can be seen from the raw data that the end positions
of the movements correlate well with the MT positions. Some of the responses
showed a small overshoot with respect to the movement end position. The
amplitude data shown in the following refer only to the final movement
position. Moreover, the movements were in general very consistent with respect
to their velocity profiles; only few movements with multiple velocity peaks
were observed.
The impression of the homogeneity of
the movement responses is confirmed by the further analysis of the movement
data. Figure 4 shows mean movement amplitudes (left graph) and mean movement
durations (right graph) as a function of the movement target location. The
vertical bars denote standard error, they are only visible for the cases where
the error exceeds symbol size. The data are plotted separately for the cases
where the discrimination stimulus was present at the central position on the
right (open circles) and on the left (filled circles). It can be easily seen
that the amplitudes are independent of the position of the discrimination
target. One central rationale of the experimental approach was that the
discrimination task should not interfere with the reaching task; this analysis
of amplitudes suggests that this is indeed the case. Moreover, the mean
movement amplitudes demonstrate that the reaching movements were very precise;
mean amplitudes are highly correlated with the given MT positions (r=0.99). A
further data analysis in form of a 2-way ANOVA confirms a highly significant
main effect of MT position (F(?,?)=1078), a nonsignificant effect of DT
position (F( ,)=0.9, p>0.1), and a nonsignificant interaction (F(,)=0.89).
A similar conclusion holds for the
movement durations (Figure 4, right graph). Average movement durations were
202, 260, and 315 msec for the small, medium, and large target eccentricity. Again,
the data are independent of DT location, suggesting that the execution of the
movement itself is not affected by the presentation of the test item. Accordingly,
ANOVA shows a highly significant main effect of MT position (F(?,?)=263.7), a
nonsignificant effect of DT position (F( ,)=0.44), and a nonsignificant interaction
(F(,)=0.80).
Figure 5 displays, on the left, mean
movement onset latencies and standard errors as function of MT location. Again,
the data are given separately for the blocks where the discrimination target
was on the right (open circles) and where DT was on the left (filled circles). Mean
movement onset latency averaged over all conditions was 437.8 msec. A 2-way
ANOVA reveals that the latencies depend neither on MT location (F(4,5)=0.74)
nor on DT location (F(1,5)=0). Also, the interaction is not significant
(F(?,3)=0.74). The open triangles in the graph display the latency data from
the "No discrimination - reaching only" single task control
condition. For this type of experiment, mean latency was 436.9 msec. Again, the
response latency was independent of MT location (F(4,5)=1.34; p>0.05).
The right part of Figure 5 shows
histograms of the distribution of the movement onset latencies, individually
for the five subjects who participated in the experiment. It can be seen that,
while mean latency varies, the distributions for all subjects are unimodal and
are skewed distribution with the long tail towards longer latencies.
Experiment
1: Perceptual performance
The subjects reported that they had
no difficulties to point quickly to the indicated target item in the string. However,
they were initially very uncertain about their ability to discriminate between
the DT items. Performance improved considerably after some initial practice,
however. Therefore, the first session served for training and was not included
in the data analysis. After the experiment, the subjects were asked for their
subjective impression and for how they solved the task. They reported that the
peripheral items that were indicated as movement targets seemed to "light
up" in a row of an almost unstructured visual field. They also had the
impression that they could exactly identify the distractor (" " or
" "), whichever appeared at the movement target position.
Our indicator for the momentary
allocation of attention (in the ventral stream) is the accuracy with which the
discrimination target can be identified. Discrimination performance can be
expressed as the percentage of correct decisions upon target identity; chance
level is 50% correct. Figure 6 presents the discrimination performance as a
function of the movement target location. Since performance was not
significantly different for DT on the left and on the right, data from the two
conditions were pooled in figure 6 such that the position of the discrimination
target always refers to the position indicated in the graph (at +7.65 deg). In
other words, negative MT locations refer to the cases where MT and DT were in
opposite hemifields.
The diagram on the left of Figure 6
shows discrimination performance as a function of relative MT position for all
response latencies (filled squares). The horizontal line represents the
discrimination performance from the "No reaching - only
discrimination" control task. The data on the graph suggests that
performance depends on the relation between position of the discrimination
stimulus and the location of the indicated movement target position;
performance is best when MT and DT positions coincide (DT=MT). When the
movement is not directed to the neighboring item, performance decreases
steeply. Performance is worst when the subject points into the direction
opposite to DT position. The performance advantage for the coincidence of MT
and DT positions was confirmed by further statistical analysis: ANOVA shows a
highly significant effect of relative MT position (F(4,5)=15.12, p<0.001). In
a post-hoc Student-Newman-Keuls test, the performance at DT=MT proved to be
superior to all other cases, which did not differ significantly (p<0.01).
Upon questioning after the
experiments, subjects occasionally reported that they had the feeling to
perform better in the discrimination task when they delayed the manual
response. An interpretation of this observation is that in these cases, DT is
discriminated first, and only later movement programming is initiated. This
should result in longer movement latencies. In other words, one should expect
an interaction between movement latency and perceptual performance. Therefore,
we analyzed performance for each subjects separately for the fast half of
responses (i.e. faster that the median latency of the subject), and for the
slow half of responses. The averaged data are shown in the right graph of
figure 6. It can be seen that for the fast responses, performance superiority
at DT=MT is still more pronounced. For these fast responses directed to the
discrimination stimulus, performance is even superior to discrimination
performance in the "no movement" control condition (89.1% correct vs.
78.3% correct). For the slow portion of responses, this kind of selectivity
largely disappears. As compared to the fast reactions, there is also a general
tendency for discrimination to improve in the cases where MT and DT are
presented in opposite directions. A two-factor ANOVA shows a significant main
effect of relative MT position (F(4?,5?)=14.73, p<0.001), and a
nonsignificant main effect of latency (F(1?,5?)=0.05). As expected, the
interaction between response latency and MT position is significant
(F(4,10??)=4.14, p<0.01). Post-hoc Newman-Keuls tests show that for the fast
half of responses, performance at MT=DT is significantly better than for the
other relative MT positions (p<0.01). For the slow responses, the
superiority of MT=DT with respect to the other relative movement positions
disappears (p>0.05). In summary, the
data show that the ability to discriminate between objects in a multi-object
scene during the preparation of a reaching movement is spatially selective, and
superior at the movement goal. This is most pronounced for fast manual
reactions.
Experiment
2: Perceptual performance
In Experiment 2, the presentation of
the discrimination target occurred only with onset of the manual response. Mean
movement onset latency was 441.2 +- 45 msec (SE). Since otherwise the
characteristics of the latency data in this experiment was identical to
experiment 1, the according data are not presented in more detail here.
In this experiment the
discrimination stimulus appeared at movement onset and was present during most
of the time of the movement. Therefore, the question arises whether DT presence
affected the precision of the reaching movement and/or its dynamical
properties. For this reason, we again analyzed the dependence of movement
amplitude and duration on DT location. The results are shown in figure 7. The
left graph displays movement amplitude as a function of MT position, parameter
is DT position. It can be seen that, as in Experiment 1, the overall movement
is rather precise and shows no effect of DT position. Accordingly, a 2-way
ANOVA yields a highly significant main effect of MT position (F(?,4)=410.8), an
nonsignificant effect of DT position (F(1,4??)=3.41; p>0.1), and no
interaction (F(?,?)=1.41; p>0.1).
The right graph displays mean
movement durations. Although there seems to be a general tendency for movements
to be shorter for DT appearing in the right hemifield, this effect does not
reach statistical significance. ANOVA yields a highly significant main effect
of MT position (F(5??,4)=20.48), but a nonsignificant effect of DT position
(F(,)=0.09) and a nonsignificant interaction (F( ,)=0.73). In summary, as in
the previous experiment, there is no indication that the movement itself is
affected by the presentation of the DT.
Figure 8 gives discrimination
performance in Experiment 2 as a function of the relative position of the
movement target, pooled over five subjects. It can be seen that also in this
case, discrimination is superior when DT and MT refer to the same object. Accordingly,
ANOVA yielded a significant effect of relative MT position (F(4,5)=4.42,
p<0.01). A post-hoc Newman-Keuls test confirmed a significant difference of
the condition DT=MT with respect to the other conditions (p<0.05). All other
data points did not differ significantly.
Discussion
The central question of this study
was whether and how visual attention in the ventral stream
(selection-for-perception) and selective reaching in the dorsal stream are
coupled. The first experiment demonstrates that perceptual-based discrimination
of a target (DT) during the preparation of a reaching movement is best when
movement target (MT) and DT refer to the same object. When MT and DT do not
coincide - even when there is, as in our experiments, a spatial segregation of
just one degree between both - discrimination performance decreases to a
considerably lower level. So, prior to the initiation of a reaching movement,
in the movement programming phase, perceptual analysis is restricted to the
movement target. During this processing phase, other objects are temporarily
excluded from high-level visual (perceptual) analysis.
In line with a recently developed
model of visual attention (VAM, Schneider, 1995) we assume that this strict
coupling between (dorsal-based) motor preparation and (ventral) perceptual
analysis is due to a common attentional mechanism that selects (for both
processing streams) one object at a time for further analysis. Which object is
selected by the attentional mechanism selects depends on the instruction. In
our experiments, the instruction requires to give priority to the reaching
task, which should be performed as fast as possible, while the discrimination
has rather the role of a secondary task. Consequently, during motor preparation
the attentional mechanism will be locked to MT. In other words, selective
dorsal processing for spatial-motor programming binds selective ventral
processing for perception and discrimination.
The second experiment showed that
coupling between dorsal and ventral processing is still effective during
movement execution. We do not want to claim that movement execution is always
accompanied by a binding of the attentional mechanism at the movement target
position. Attention should only be allocated to the future movement target when
it is necessary to evaluate the success of the movement. In order to make this
evaluation, it is necessary to process information about the actual movement
endposition and to compare it with the intended movement position. This
comparison should not be done preattentively. However, when a movement is
highly practiced - this touches the issue of "automaticity" (see,
Neumann, 1984; Shiffrin, 1988; Logan, xx; for overviews and issues) - and needs
no "feedback" control, then attention to the results of the action
execution is assumed to be not necessary. An example of such an action might be
shifting gears during driving a car.
A implication of VAM is, if DT
appears during the programming phase at a position that does not correspond to
MT then discrimination of DT should hardly be possible. The attentional
mechanism is engaged at MT which should temporarily prevent the processing of
other objects such a DT. However, our data, show - in difference to the eye
movement data by Deubel & Schneider (1996) - that discrimination
performance in the case of non-correspondence between MT and DT is well above
chance level. We conjecture that fast attentional shifts to DT prior to
movement initiation that occur in some trials are the cause of this above
chance performance level. The processing event in such trials might look like
this: The color cue is processed and initiates an attentional shift to the
future movement position and "programming" begins. Next, DT appears
and the attentional mechanism is shifted towards it. DT is stored in visual
short-term memory, and the attentional mechanism returns to MT in order to
complete "programming". Consequently, in these trials with prior
attention shifts to DT, the movement initiation is delayed later and the
latency should be increased. This strategy predicts that trials with longer
movement latencies would be accompanied with better discrimination performance
- these should be the case of prior attention shifts. The data shown in figure
6 support this conclusion. The median split of discrimination performance based
on the movement latencies showed that the "long latency" cases of the
reaching movement are accompanied by better performance for non-correspondent
positions of MT and DT compared to the "short latency" cases that
made a prior attentional shift implausible. However, why is discrimination
performance worse for the "long latency" case in the correspondence
condition? We suggest that temporary disengagement (Posner et al., 1984) from
MT might cause this performance drop. If the disengagement is triggered by the
onset of DT but not by its specific location - which needs more time to be
computed - then every time a shift is prepared and DT appears, disengagement
happens no matter where DT appears. It takes some time for re-engagement at MT
even when it is the same location. During disengagement, attention is
withdrawn, and this causes a performance drop for DT processing.
Based on VAM, such specification at
the neuro-cognitive level should be briefly given. A typical experimental trial
should consist of the following processing events. The color cue appears, and
it is required that the visual attention mechanism is allocated to this cue. This
means that the activation flow of low-level V1/V2 representations of the corresponding
visual unit of the color cue to higher-level dorsal and ventral area is gated
(e.g. increased) - see, also, LaBerge & Brown (1989) and van der Heijden
(1992). As a consequence of this gated activation flow, color and arrow
direction recognition are performed in high-level visual ventral areas (e.g
IT). Based on this high-level visual information, the attention gating
mechanism in V1/V2 is shifted to the location in the string indicated by the
color cue information (MT). When the gating mechanism is locked on the
corresponding V1/V2 representations of MT, movement, that is, reaching
programming in the high-level dorsal areas (e.g. area 7b) begins. Motor
programming within the framework of VAM means that the activation flow to
high-level areas needs to have a certain duration before the neural pattern in
these areas reaches a sufficient activation level. During this motor
preparation phase, the discrimination target appears. As remarked above, two
processing options are available. Either the attentional mechanism shifts
towards DT, or it stays on MT. The first option, the shift, leads in case of
non-correspondence of DT and MT processing of DT and its storage in short-term
memory. When the mechanism returns to MT, programming is continued so the motor
pattern can reach its desired activation level. Because motor programming, that
is, activation flow to the corresponding high-level dorsal areas (e.g. 7b for
reaching), can only occur as long as the attentional mechanism gates the
activation flow, the shift to DT causes a delay in movement initiation.
There is one further attentional
theory besides VAM that deals dorsal spatial-motor programming, namely the
premotor theory by Rizzolatti et al. (1987) and Rizzolatti et al. (1994). The
central claim of this theory is that the control of "spatial
attention" originates in the dorsal spatial-motor areas. Originally, only
eye movement areas were suggested to control "spatial attention"
(Rizzolatti et al, 1987). In the recent version, Rizzolatti et al. (1994, p. 240)
specified the effect of spatial attention on ventral processing, by stating
that "movement preparation facilitates the input side of pragmatic maps
involved in the task, thus improving the stimulus detection." "Pragmatic
maps" means high-level spatial-motor areas (e.g. area 7b). Two brief
comments seems to us necessary. First, what the input side of these pragmatic
maps are is not specified by Rizzolatti et al. (1994), so that no specific
effect on ventral processing can be derived from the premotor theory. Consequently,
and due to the emphasis on "spatial attention", the premotor does not
predict a one-object-specific coupling. Second, Rizzolatti et al. (1985), and
Rizzolatti et al. (1994) have claimed - based on data on different
versions of neglect - that no single attentional mechanism exists (see, also,
Allport, 1993). Instead, multiple visual-spatial attention centers / mechanisms
are assumed. In contrast, VAM proposes - in line with the spirit of Posner
& Petersen (1990) - that there is a single visual-spatial mechanism. This
mechanism operates in early visual areas, and only one object at a time can be
selected. According to VAM, and in difference to the premotor theory, it should
not be possible to program a goal-directed saccade to one object, a
arm-movement to another object, and simultaneously recognize a further
different object. Such experiment is one of the projects we are currently
working on and the data will decide whether VAM or the premotor theory is
correct.
Besides the proposed coupling
between saccades, reaching and perceptual analysis, a further interesting
prediction can be derived from VAM. Not only the location of a grasping target,
but further parameters of the grasping programming process should control the
attentional gating process in V1/V2 and therefore bind selective ventral
stream. These further parameters are the size, orientation, and maybe the rough
shape (relevant for the grip) of the to-be-grasped object. The gating mechanism
in V1/V2 is assumed to mirror these parameters, size, orientation, and rough
shape (and not just a "circular spotlight"). We are currently running
such an experiment for testing this claim by using the same dual task paradigm
as in this study. Size, orientation and shape of grasping and/or perceptual
object are varied and the prediction is that correspondence in these parameters
between the two object (given the same location) should lead to better
perceptual performance.
What attentional effects should be
expected at the single cell level given the claim of selective processing of
one-object-at-a-time is correct (VAM)? For the ventral stream, a study by
Chelazzi et al. (1993) has shown that a goal-directed saccade to a target
object surrounded by distractors leads to decrease in firing rate of IT neurons
representing a distractor. IT neurons are assumed to compute the identity of
objects based on visual shape (see, e.g., Oram & Perrett, 1994). Interestingly,
the results show that prior the saccade initiation (90-120 msec before), the
target neuron firing rate rises, while the distractor neuron firing rate begins
to decline. Data that show the same firing rate differences in the dorsal
stream with a target-distractor-configuration are currently missing (see, for
dorsal attention single cell effects, Bushnell et al., 1981; Desimone &
Duncan, 1995). A model such as VAM predicts that for dorsal brain areas such as
LIP (eye-movement related neurons), or area 7b (arm-movement related neurons),
the firing rate of the target and distractor neurons should diverge prior to
the movement (in the programming phase). The target firing rate should rise
while the distractor firing rate should decline. This divergence should be
locked to a fixed period prior to the movement initiation. These data would be
a fine complement to the behavioral data and the conclusion drawn in this
study.
-
amazingly high spatial precision of responses: due to feedback by LED
Overall,
the data strongly argue for an obligatory and object-specific coupling of
selective dorsal processing during the preparation of a reaching movement and
selective ventral perceptual processing for object recognition. Therefore, the
claim of VAM6 is supported, that as long as one of the two pathways carries out
selective computations, e.g. program a reaching movement or saccade, selective
computation in the other pathway, e.g. object recognition, is bound to the same
object. This consideration predicts for the single cell level, that during the
preparation of a spatial motor action, neurons in the dorsal pathways, e.g.
"reaching cells" in area 7b, should firing with enhanced rate, while
those recognition-related neurons in inferior-temporal lobe that represent the
reaching target object (or an object at RT location) should also reveal a
higher firing rate compared to cells, representing non-reaching-target-objects.
Furthermore, these firing rate differences in ventral and dorsal processing
should be temporarily coupled and occur at the same time slice after stimulus
presentation.
Deubel,
Shimojo & Paprotta (in preparation): same for line motion
Close
coupling even more required for grasping where, more than just location,
physical characteristics of objects such as size, shape and orientation have to
be accounted for in movement preparation.
Possible
solution: direct link of perceptual and motor codes, or even: common coding
(Prinz)
-Evidence
that the representations of specific actions that relate to an object can be
activated by its visual presentation
Umilta
(same volume)
Klatzky:
associations between objects and actions carried out
AIP
& F5: goal-directed hand movements
References
Figure captions
Figure
1: Experimental apparatus.
Figure
2: Stimulus sequence in Experiment 1. The trial starts with the
presentation of a small fixation cross and two strings of characters left and
right of the central fixation. The three central items of each letter string
appear on ellipses of red (r), green (g) and blue (b) color. Initially, the
subject positions his/her fingertip on the location of the central cross, the
fingertip position is indicated by the arrowhead. After a delay of 1-1.6 sec, a
symbolic cue in the form of a red, green or blue triangle appears in the center
of the screen, pointing either to the right or to the left side; this cue specifies
the movement target within the string. 150 msec later the premask characters
change into nine distractors and one discrimination target (" " or
" "). Target and distractors remain visible for 150 msec. Then, the
characters and the central cue are removed and only the colored ellipses
remain.
Figure
3: Time courses of manual reaching responses as measured with the Polhemus
Fastrack system. The graph shows examples of reaching movements from one
subject, and for the various movement target eccentricities.
Figure
4: Left: Mean movement amplitudes as a function of the movement target
location. Vertical bars denote standard errors. Data are plotted separately for
the cases where the discrimination stimulus was present at the central position
on the right (open circles) and on the left (filled circles). Right: Movement
durations.
Figure
5: Left: Mean movement onset latencies and standard errors as function of
MT location. Data are given separately for the blocks where the discrimination
target was on the right (open circles) and on the left (filled circles). Open
triangles display the latency data from the "No discrimination - reaching
only" single task control condition. Right: Histograms of the latency
distribution, presented individually for the five subjects.
Figure
6: Left: Discrimination performance as a function of movement target
location. Data for DT on the left and on the right are pooled such that the
position of the discrimination target always refers to the position indicated
in the graph at +7.65 deg. Vertical bars indicate standard errors. Horizontal
line represents discrimination performance from the "No reaching - only
discrimination" control trials. Right: Discrimination performance data
after median split.
Figure
7: Same as figure 4, but for Experiment 2.
Figure
8: Discrimination performance as a function of movement target location in
Experiment 2. Data for DT on the left and on the right are pooled such that the
position of the discrimination target always refers to the position indicated
in the graph at +7.65 deg. Vertical bars indicate standard errors.