Selective dorsal and ventral processing: 

Evidence for a common attentional mechanism in reaching and perception

 

 

 

Heiner Deubel, Werner X. Schneider and Ingo Paprotta

 

 

 

Institut für Psychologie

Allgemeine und Experimentelle Psychologie

Ludwig-Maximilians-Universität München

Germany

 

 

 

 

 

 

 

 

Running head: Reaching and attention

 

Correspondence to: Heiner Deubel, Institut für Psychologie, Ludwig-Maximilians-Universität, Leopoldstrasse 13, D-80802 München, Germany. Fax: +49-89-2180-5211. E-mail: Deubel@mip.paed.uni-muenchen.de.

 

Acknowledgement: This reseach was supported by the Deutsche Forschungsgemeinschaft, SFB 462 ("Sensomotorik").

 

Abstract

 

We recently demonstrated that visual attention before saccadic eye movements is focussed on the saccade target, allowing for spatially selective object recognition (Deubel & Schneider, 1996). Here we investigate the role of visual selective attention in the preparation of manual reaching movements. A dual-task paradigm required the preparation of a reaching movement to a cued item in a letter string. Simultaneously, the ability to discriminate between the symbols "E" and "mirror-E" presented tachistoscopically within the surrounding distractors was taken as a measure of perceptual performance. The data demonstrate that discrimination performance is superior when the discrimination stimulus is also the target for manual aiming; when discrimination stimulus and pointing target refer to different objects, performance deteriorates. So, it is not possible to maintain attention on a stimulus for the purpose of discrimination while directing a movement to a spatially separate object. The results argue for an obligatory coupling of selection-for-perception and selection-for-action. The findings are discussed in relation to dorsal and ventral visual processing streams.

 

 

 

Introduction

 

            Our knowledge about the architecture of the visual system of primates has increased enormously during the last two decades. There is growing consensus that visual processing occurs in parallel and interacting streams at different, quasi-hierarchical levels (see, e.g., DeYoe & VanEssen, 1988; Hubel & Livingstone, 1988; Zeki, 1993; Milner & Goodale, 1995). Several suggestions have been made how this parallel and distributed processing of visual information might be functionally organized. One suggestion by Mishkin, Ungerleider, & Macko (1983) - based on lesion work in monkeys - claims that the visual system consists of two main pathways, namely the dorsal "where"-pathway, and the ventral "what"-pathway. The suggested function of the ?what@-pathway is to recognize objects based on their visual appearance. The ?where@-pathway, on the other hand, computes spatial information about objects. At the cortical level, the segregation of both pathways can be tracked back to the primary visual cortex, area V1. From there, the "where"-pathway runs dorsally into the posterior parietal lobe while the "what"-pathway leads ventrally to the inferior temporal lobe. Since this proposal, a large body of research supported this distinction of two main pathways (see, however, Zeki, 1993). For instance, patients with brain lesions restricted to the inferior temporal cortex have problems to recognize objects by sight, a symptom called visual agnosia (see, e.g., Farah, 1990; Kolb & Whishaw, 1990). At the same time, spatial abilities, such as pointing to an object, are left intact. When agnosia is purely visual, recognition by other senses, such as touch, is still intact. Lesions restricted to the superior parietal areas of the dorsal "where"-pathway, on the other hand, can cause a symptom called optic ataxia (see, e.g., Milner & Goodale, 1995). These patients are able to identify objects due to their visual appearance, but they exhibit misreaching (mislocalization) towards the same objects.

            The labeling of ventral and dorsal pathway as a "what"- and a "where"-pathway was recently criticized by Goodale and Milner (1992; Milner & Goodale, 1995). These authors still agree with ascribing the computation of "what"-aspects, that is, the identification of objects, to the ventral pathway. They disagree, however, about the function of the dorsal pathway. Not perception of the spatial layout of the external world is its main task but instead computation of spatial information for motor actions such as a saccade or a reach towards an object. In other words, Goodale & Milner (1992) suggest a shift in emphasis from spatial perception to spatial information for action. Their view of dorsal processing is supported by human neuropsychological studies and neurophysiological work in macaques, especially by single cell recordings (see Milner & Goodale, 1995). The reviewed data indicate that the idea of a single representation of external space is probably wrong, and that instead several spatial-motor representations - sometimes also called processing streams - exist in parallel for different kinds of motor actions (see, e.g., Stein, 1992; Graziano & Gross, 1994; Rizzolatti, Riggio, & Sheliga, 1994; Milner & Goodale, 1995). For instance, information about saccade landing points is probably computed and coded in the lateral intraparietal area (LIP), while endpoints for grasping movements are computed in area 7b - both are part of the parietal lobe. So, the brain seems to code spatial information for different effectors, that is, for different action classes, in different parts of the brain. In summary, Milner & Goodale (1995) suggest that the ventral stream is involved in visual perception and identification, while the dorsal stream computes information for spatial-motor actions. A related distinction was recently suggested by Jeannerod (1994) who differentiated a "semantic mode" of processing, located in the temporal lobe (ventral stream) and a "pragmatic mode", located in the parietal cortex (dorsal stream).

            Visual processing in both streams does not occur in a purely automatic, "bottom-up" driven manner. Rather, control of processing is task-dependent - this type of selectivity of visual processing has been often called endogenous visual attention (e.g. Posner, 1980). A tremendous amount of research in experimental psychology and the neurosciences has investigated the properties of these selection processes in vision (see, Treisman, 1988; Bundesen, 1990; Posner & Petersen, 1990; van der Heijden, 1992; Schneider, 1993; Duncan & Desimone, 1995; for overviews). Traditional experimental psychology focussed on the function of visual attention in the ventral stream, that is, on "selection-for-visual-perception". For instance, experiments on visual search (see, Treisman & Gormican, 1988; Wolfe, 1994; for overviews) attempted to determine how fast and how accurate certain visual attributes and their conjunctions can be "perceived" and signalled. In most of these investigations, "ventral" attributes such as color, orientation, etc., served as the properties that defined the search target. Therefore, selection-for-visual-perception (in contrast to selection -for-spatial-motor-control - the dorsal processing domain) has been the main topic of search tasks. Another research line where the effects of visual attention were mainly investigated for the ventral processing refers to the spatial precueing paradigm (e.g. Eriksen & Hoffman, 1973; Posner, 1980; van der Heijden, 1992). The experiments show that preknowledge about the possible location of a target leads to faster and more accurate responses to visual aspects such as alphanumeric identity or simple shape features such as curved vs. straight (see, van der Heidjen, 1992; Posner & Raichle, 1994; for overviews).

            This bias to measure the effect of visual attention mainly for ventral visual processing can be traced back to the suggested functions of attention. Attention is assumed to "facilitate detection" (e.g. Posner, 1980), to allow "feature integration" (Treisman & Gelade, 1980), "object recognition" (e.g. LaBerge & Brown, 1989; Schneider, 1995), and "entry to visual short-term memory" (Duncan & Humphreys, 1989; Bundesen, 1990). These assumptions do not imply that the selection mechanism itself is located only the ventral stream. Instead, several theories suggested a central role of the dorsal stream in controlling the attentional mechanism, sometimes called spatial attention mechanism (e.g. LaBerge & Brown, 1989; Posner & Petersen, 1990; van der Heijden, 1992; Schneider, 1995).

            As compared to the large body of theoretical work on the relation of attention and (ventral) perceptual processing, there exist only rather few suggestions about the role of visual attention in dorsal processing, more precisely, about the role of attention in spatial-motor control. Allport (1987) and Neumann (1987) suggested that spatial-motor actions such as grasping an object among other objects also imply a selection process, which is what Allport (1097) called "selection-for-action". Natural environments usually contain several objects, and only of them should be used as the target for an individual action. For instance, grasping a pen among other pens requires the motor system to receive spatial information (probably in arm-centered coordinates, see, e.g., Grazianno & Gross, 1994) of the intended pen only. Information from other pens has to be excluded from controlling the grasping action. In other words, an attentional mechanism is needed that selects the spatial information of the movement target. Because spatial information is provided by the visual system (the dorsal pathway), Allport (1987, 1989) and Neumann (1987, 1990) have suggested that visual attention is involved in this selection process. Another example of selection-for-spatial-motor-action refers to the control of saccadic eye movements. Before each saccade, the next fixation point has to be selected among many potential candidates in the environment.

            Unfortunately, there is not much experimental work on selection-for-spatial-motor-action. Tipper, Lortie & Baylis (1992) investigated the role of visual attention for manual reaching in an interference paradigm. The question was whether interference effects found for ventral visual processing (e.g. Eriksen & Eriksen, 1974) can also be obtained for spatial-motor actions. The degree of interference is usually considered as a measure of the efficiency of attentional processes. In these experiments, subjects had to reach, as fast and as precisely as possible, from a starting position to one of nine locations indicated by a red light (the target). In some trials, a yellow light (the distractor) appeared - simultaneously with the red target light - at a different location. Substantial interference effects were obtained; response latencies were prolonged compared to trials where no distractor appeared. This interference effect was only observed when the distractor was located between the starting position and the target. The results show that interference effects of nearby objects can also be obtained for spatial-motor action such as reaching, suggesting that visual attention processes are also involved in selection-for-spatial-motor-action. A similar conclusion was reached by Castiello (1996). In one of his experiments, subjects had to perform a grasp to a target as a primary task. A secondary, non-spatial task was required for a different object located close to the target. The author observes interference effects of the secondary task on the kinematics of the primary grasping movement.

            Another research line that deals with dorsal selection refers to the relation of eye movement control and visual attention. The leading question has been whether visual attention for perceptual processing on the one hand, and the selection of a target for a saccade on the other are independent or not. The results of early experiments on this issue were controversial (e.g. Klein, 1980; Posner, 1980; Remington, 1980), partly due to methodological problems (see, Shepard et al., 1986). More recent work (Hoffman & Subramaniam, 1995; Kowler et al, 1995; Deubel & Schneider, 1995; Schneider & Deubel, 1995) clearly demonstrated a strict link between ventral selection-for-perception and dorsal selection-for-a-saccade.

            In the experiments of Deubel & Schneider (1996), subjects had to saccade to locations within horizontal letter strings left or right from a central fixation cross. The performance in discriminating between the symbols " " and " " presented tachistoscopically before the saccade within the surrounding distractors was taken as a measure of visual attention. The data showed that discrimination performance is best when discrimination target and saccade target referred to the same object. This result holded no matter whether the saccade was directed by a peripheral cue or by a central cue. The findings strongly argue for an obligatory and selective coupling of saccade programming and visual attention; this coupling between dorsal and ventral processing is restricted to one common target object at a time.

            Based on these data and other computational considerations, Schneider (1995) postulated a Visual Attention Model (VAM) that suggests a common selection mechanism for both processing streams. In line with two-stage models of perception and attention (Neisser, 1967), a first stage of low-level visual processing computes, in early visual areas of the brain (e.g. V1, V2), elementary visual information in form of "primitive" object structures (visual units). Higher-level visual processing in the dorsal and ventral stream occurs only for one visual unit (one "object"). In the model, visual attention is the mechanism that determines the unit, carries out the selection, and gates the information flow from low- to high-level vision in a way that only information from one object is further processed. VAM claims that visual attention selects one low-level visual object at a time, leading to priorized perceptual processing in the ventral stream (e.g., the object is recognized). Simultaneously, possible spatial-motor actions (saccade, pointing, reaching, grasping etc.) towards this object are programmed in the dorsal stream. Only the (effector-specific) "Go"-signal is necessary to convert the programs into overt action.

            As described before, such attention-mediated and object-specific coupling of dorsal and ventral processing has already been demonstrated for eye movement control and perceptual selection (Deubel & Schneider, 1996). More than just for saccades, however, VAM predicts that the same coupling should also hold for aiming, reaching, and grasping (Schneider, 1995; p. 363). In the present study we analyzed the coupling of reaching movements and visual discrimination. For this purpose, a dual-task paradigm similar to that used in our previous studies was developed. The primary task was to make a goal-directed reaching movement to a cued object, measuring selection-for-spatial-motor-action in the dorsal stream. Prior to the movement, a secondary task required to discriminate between the characters "E" and mirror-"E", measuring selection-for-perception ("traditional" visual attention) in the ventral stream. It is hypothesized that the programming of the reaching movement yokes the visual attention mechanism, so that during this selection process no other object can be processed in high-level ventral vision. Consequently, discrimination performance should be best when discrimination target and reaching target refer to the same object. For non-corresponding reaching and discrimination targets, better than chance performance is only possible when visual attention shifts first to the discrimination target and then to the reaching target. In this case, longer initiation latencies for the movement should be expected.

 

Methods

Subjects

            Initially, 7 subjects participated in the experiments. Two of them were excluded from further analysis since they were not able, even after some training, to produce mean hand movement latencies shorter than 700 msec. The age of the five remaining subjects ranged from 22 to 28 years. They had normal vision and normal motor behavior. All subjects had experience in a variety of experiments in oculomotor research. One subject was one of the authors of the study, the others were naive with respect to the aim of the experiments.

 

Experimental set-up

            Figure 1 shows a sketch of the experimental situation. The subject was seated in a dimly lit room. The visual stimuli were presented on a fast 21 inch color monitor (CONRAC 7550 C21), visible through a one-way mirror. The monitor provided a frame frequency of 100 Hz at a spatial resolution of 64 pixels per inch. Active screen size was 40 x 30 cm; the viewing distance was 57.7 cm. The video signals were generated by a freely programmable graphics board (Kontron KONTRAST 8000), controlled by a PC via the TIGA (Texas Instruments Graphics Adapter) interface. The stimuli appeared on a grey background which was adjusted to a mean luminance of 2.2 cd/m2. The luminance of the stimuli was 23 cd/m2. The relatively high background brightness is essential in order to avoid the effects of phosphor persistence (Wolf and Deubel, 1996).

            The application of a one-way mirror allowed free hand movements to the stimuli without visual feedback about the hand position. Reaching movements were recorded with a Fastrak electro-magnetic position and orientation measuring system (Polhemus Inc. 1993) and sampled at 400 Hz. The sender device was fixed at 60 cm in front of the subject. The sender emits time-multiplexed, orthogonal electro-magnetic fields of 10 kHz frequency. From induction in the receiver which was mounted on the fingertip of the subject's right hand, the orientation relative to the sender device is calculated by a central processing unit. From the intensity of the electro-magnetic field, the distance between sender and receiver is determined. The position in space is calculated from distance and orientation by use of a specific digital signal processor (TI320C30). The device allows for a maximum translation range of 10 feet, with an accuracy of 0.03 inches RMS. The frequency response is 120 Hz; without further filtering the phase lag response is only 4 msec. Connected on the receiver was a red LED (5 mm diameter), controlled by the PC. The LED allowed to provide controlled visual feedback about the spatial position of the fingertip.

            Eye fixation was controlled by an infrared eyetracker (IRIS, Skalar Medical) with a temporal bandwidth of 240 Hz. This device measures the reflection difference between sclera and iris by infrared LEDs and phototransistors that are applicated next to the subject's eyes. Head movements were restricted by an adjustable chin rest. The experiments were completely under the control of a 486 Personal Computer. The PC also served for the automatic off-line analysis of the pointing movement data in which movement latencies and start and end positions of the manual responses were determined.

 

Calibration and data analysis

            Each session started with a calibration procedure of the eyetracker in which the subject had to sequentially fixate 3 positions arranged on a horizontal line at distances of 8.5 deg. Further, the origin and coordinate alignment frame of the position sensor were set relative to the projected position of the monitor center. The position sensor behaved linearly within 30 cm around the center position. The overall accuracy was better than 2 mm. In order to determine latency, amplitude, and duration of the reaching movements, an off-line program for evaluation of movement trajectory parameters searched the movement record for the transgression and subgression of a vectorial velocity threshold of 10 mm/s (which is equivalent to about 1 deg/sec). The beginning and the end of the reaching movement was calculated as linear regressions in a 200 msec time window around these points. With respect to a possible drift movement after the reaching the end position had to stay within a 2 mm interval for 62.5 msec after the initial movement.

 

Experimental paradigm

            After an initial training block that was not included in the data analysis each subject underwent six blocks (3 blocks per day) of each of the experiments; each block consisted of 120 single trials. The subject performed a dual task. In each experimental trial, the reaching movement was guided by a central, symbolic cue that indicated the movement target (MT) within a string of letters. Moreover, the subject had to report the identity of a discrimination target (DT) presented tachistoscopically in the string. Two different experiments were performed. In Experiment 1, DT appeared before the hand movement. For each experimental block, the position of DT was constant, either on the right or on the left, and on the central position of the string. Experiment 2 was similar to Experiment 1 except that here, DT was presented only at the onset of the reaching movement.

            Figure 2 shows an example for the sequence of stimuli in a single trial of Experiment 1. Each trial started with the presentation of a small fixation cross in the center of the screen, with a size of 0.25 deg. Simultaneously, two strings of premask characters appeared left and right of the central fixation, each consisting of five pre-mask items resembling the number "8". The width of each item was 0.9 deg of visual angle, its height was 1.4 deg. The distance between the items was 2.4 deg, with the central item of the five letters being presented at an eccentricity of 7.65 deg. The three central items of each letter string appeared on ellipses of red (r), green (g) and blue (b) color, as indicated in the figure. Color intensities of the ellipses were adjusted by flicker-photometry to appear about equally salient.

            The subject was asked to keep strict fixation during the whole trial at the center of the screen, initially indicated by a central fixation cross. Maintenance of fixation was controlled by the IRIS oculometer. At the beginning of the trial, the subject had to position his/her fingertip on the location of the central cross. The position of the fingertip is indicated by the arrowhead in figure 2. In this phase, the LED was switched on, aiding the precise positioning. After a delay of 1000 to 1600 msec, a symbolic cue in the form of a red, green or blue triangle appeared in the center of the screen, pointing either to the right or to the left side. Color and pointing direction of the triangle thus unequivocally indicated a specific item, the movement target (MT), within the string. The primary task was to "point to this target item as fast and precisely as possible". Simultaneously with cue onset, the LED was switched off to disable any further visual feedback of hand or pointing position. 150 msec after cue appearance, well before the onset of the pointing movement, the premask characters changed into nine distractors and one discrimination target. The distractors were randomly selected among the characters " " and " ". The central character on one of both sides was replaced by the discrimination target (DT) which consisted either in the letter " " or a mirror-symmetrical version of this letter (" "). The position of the DT was constant during each block and known to the subject (e.g., central position in the string on the right side). The movement target positions however were varied independently within the central three items of the strings, resulting in 12 combinations of movement target / discrimination target positions. All experimental conditions occurred with equal probability. Target and distractors remained visible for 150 msec. Then, the items and the central cue were removed and only the colored ellipses remained.

            Due to the timing of the stimulus presentation, the discrimination target was no longer present 300 msec after the appearance of the colored triangle. As a result of this stimulus timing most reaching movements were initiated well after the disappearance of target and distractors (see figure 5). In order to eliminate occasional responses that occurred too early, the off-line data analysis discarded movements with latencies shorter than 200 msec. Also, trials with movement velocities smaller than 11 mm/s2 and durations shorter than 50 msec and longer than 600 msec were not considered in the analysis. These incidences occurred in less than 2% of all trials.

            1 sec after the onset of the reaching movement the LED was switched on again in order to enable a visual feedback control of the reached finger position. Finally, the subject indicated, without time pressure, the identity of the discrimination target (" " or " ") by pressing one of two buttons (2AFC task). The central fixation cross reappeared after the subject's decision and the next trial was initiated by the computer.

            In separate sessions, two types of "single-task" controls were run. A first control task ("No discrimination - reaching only" single task condition) served to discern pointing reaction times in a single task situation. For this purpose, the subject was asked to point to the indicated position, but was not required to discriminate. A second control task ("No reaching - discrimination only" single task condition) served to test the discrimination performance without pointing reaction. Here, the subject was only asked to indicate the identity of the discrimination target, but no reaching reaction was required. Each subject performed two blocks of each control task.

            Experiment 2 was very similar to Experiment 1 except that here, the presentation of the discrimination stimulus occurred only at the onset of the reaching movement. For this purpose, the computer performed an on-line calculation of movement velocity. Stimulus presentation was triggered when the velocity exceeded a threshold of 1 deg/sec.

 

Results

 

Experiment 1: Movement performance

            After the training in the initial training block, all five subjects were able to produce reaching movements with surprisingly consistent accuracy and latency. Figure 3 gives examples of several manual responses from one of the subjects. The graph displays the registered finger position as a function of time, for the different movement target eccentricities. It can be seen from the raw data that the end positions of the movements correlate well with the MT positions. Some of the responses showed a small overshoot with respect to the movement end position. The amplitude data shown in the following refer only to the final movement position. Moreover, the movements were in general very consistent with respect to their velocity profiles; only few movements with multiple velocity peaks were observed.

            The impression of the homogeneity of the movement responses is confirmed by the further analysis of the movement data. Figure 4 shows mean movement amplitudes (left graph) and mean movement durations (right graph) as a function of the movement target location. The vertical bars denote standard error, they are only visible for the cases where the error exceeds symbol size. The data are plotted separately for the cases where the discrimination stimulus was present at the central position on the right (open circles) and on the left (filled circles). It can be easily seen that the amplitudes are independent of the position of the discrimination target. One central rationale of the experimental approach was that the discrimination task should not interfere with the reaching task; this analysis of amplitudes suggests that this is indeed the case. Moreover, the mean movement amplitudes demonstrate that the reaching movements were very precise; mean amplitudes are highly correlated with the given MT positions (r=0.99). A further data analysis in form of a 2-way ANOVA confirms a highly significant main effect of MT position (F(?,?)=1078), a nonsignificant effect of DT position (F( ,)=0.9, p>0.1), and a nonsignificant interaction (F(,)=0.89).

            A similar conclusion holds for the movement durations (Figure 4, right graph). Average movement durations were 202, 260, and 315 msec for the small, medium, and large target eccentricity. Again, the data are independent of DT location, suggesting that the execution of the movement itself is not affected by the presentation of the test item. Accordingly, ANOVA shows a highly significant main effect of MT position (F(?,?)=263.7), a nonsignificant effect of DT position (F( ,)=0.44), and a nonsignificant interaction (F(,)=0.80).

            Figure 5 displays, on the left, mean movement onset latencies and standard errors as function of MT location. Again, the data are given separately for the blocks where the discrimination target was on the right (open circles) and where DT was on the left (filled circles). Mean movement onset latency averaged over all conditions was 437.8 msec. A 2-way ANOVA reveals that the latencies depend neither on MT location (F(4,5)=0.74) nor on DT location (F(1,5)=0). Also, the interaction is not significant (F(?,3)=0.74). The open triangles in the graph display the latency data from the "No discrimination - reaching only" single task control condition. For this type of experiment, mean latency was 436.9 msec. Again, the response latency was independent of MT location (F(4,5)=1.34; p>0.05).

            The right part of Figure 5 shows histograms of the distribution of the movement onset latencies, individually for the five subjects who participated in the experiment. It can be seen that, while mean latency varies, the distributions for all subjects are unimodal and are skewed distribution with the long tail towards longer latencies.

 

Experiment 1: Perceptual performance

            The subjects reported that they had no difficulties to point quickly to the indicated target item in the string. However, they were initially very uncertain about their ability to discriminate between the DT items. Performance improved considerably after some initial practice, however. Therefore, the first session served for training and was not included in the data analysis. After the experiment, the subjects were asked for their subjective impression and for how they solved the task. They reported that the peripheral items that were indicated as movement targets seemed to "light up" in a row of an almost unstructured visual field. They also had the impression that they could exactly identify the distractor (" " or " "), whichever appeared at the movement target position.

            Our indicator for the momentary allocation of attention (in the ventral stream) is the accuracy with which the discrimination target can be identified. Discrimination performance can be expressed as the percentage of correct decisions upon target identity; chance level is 50% correct. Figure 6 presents the discrimination performance as a function of the movement target location. Since performance was not significantly different for DT on the left and on the right, data from the two conditions were pooled in figure 6 such that the position of the discrimination target always refers to the position indicated in the graph (at +7.65 deg). In other words, negative MT locations refer to the cases where MT and DT were in opposite hemifields.

            The diagram on the left of Figure 6 shows discrimination performance as a function of relative MT position for all response latencies (filled squares). The horizontal line represents the discrimination performance from the "No reaching - only discrimination" control task. The data on the graph suggests that performance depends on the relation between position of the discrimination stimulus and the location of the indicated movement target position; performance is best when MT and DT positions coincide (DT=MT). When the movement is not directed to the neighboring item, performance decreases steeply. Performance is worst when the subject points into the direction opposite to DT position. The performance advantage for the coincidence of MT and DT positions was confirmed by further statistical analysis: ANOVA shows a highly significant effect of relative MT position (F(4,5)=15.12, p<0.001). In a post-hoc Student-Newman-Keuls test, the performance at DT=MT proved to be superior to all other cases, which did not differ significantly (p<0.01).

            Upon questioning after the experiments, subjects occasionally reported that they had the feeling to perform better in the discrimination task when they delayed the manual response. An interpretation of this observation is that in these cases, DT is discriminated first, and only later movement programming is initiated. This should result in longer movement latencies. In other words, one should expect an interaction between movement latency and perceptual performance. Therefore, we analyzed performance for each subjects separately for the fast half of responses (i.e. faster that the median latency of the subject), and for the slow half of responses. The averaged data are shown in the right graph of figure 6. It can be seen that for the fast responses, performance superiority at DT=MT is still more pronounced. For these fast responses directed to the discrimination stimulus, performance is even superior to discrimination performance in the "no movement" control condition (89.1% correct vs. 78.3% correct). For the slow portion of responses, this kind of selectivity largely disappears. As compared to the fast reactions, there is also a general tendency for discrimination to improve in the cases where MT and DT are presented in opposite directions. A two-factor ANOVA shows a significant main effect of relative MT position (F(4?,5?)=14.73, p<0.001), and a nonsignificant main effect of latency (F(1?,5?)=0.05). As expected, the interaction between response latency and MT position is significant (F(4,10??)=4.14, p<0.01). Post-hoc Newman-Keuls tests show that for the fast half of responses, performance at MT=DT is significantly better than for the other relative MT positions (p<0.01). For the slow responses, the superiority of MT=DT with respect to the other relative movement positions disappears (p>0.05).    In summary, the data show that the ability to discriminate between objects in a multi-object scene during the preparation of a reaching movement is spatially selective, and superior at the movement goal. This is most pronounced for fast manual reactions.

 

Experiment 2: Perceptual performance

            In Experiment 2, the presentation of the discrimination target occurred only with onset of the manual response. Mean movement onset latency was 441.2 +- 45 msec (SE). Since otherwise the characteristics of the latency data in this experiment was identical to experiment 1, the according data are not presented in more detail here.

            In this experiment the discrimination stimulus appeared at movement onset and was present during most of the time of the movement. Therefore, the question arises whether DT presence affected the precision of the reaching movement and/or its dynamical properties. For this reason, we again analyzed the dependence of movement amplitude and duration on DT location. The results are shown in figure 7. The left graph displays movement amplitude as a function of MT position, parameter is DT position. It can be seen that, as in Experiment 1, the overall movement is rather precise and shows no effect of DT position. Accordingly, a 2-way ANOVA yields a highly significant main effect of MT position (F(?,4)=410.8), an nonsignificant effect of DT position (F(1,4??)=3.41; p>0.1), and no interaction (F(?,?)=1.41; p>0.1).

            The right graph displays mean movement durations. Although there seems to be a general tendency for movements to be shorter for DT appearing in the right hemifield, this effect does not reach statistical significance. ANOVA yields a highly significant main effect of MT position (F(5??,4)=20.48), but a nonsignificant effect of DT position (F(,)=0.09) and a nonsignificant interaction (F( ,)=0.73). In summary, as in the previous experiment, there is no indication that the movement itself is affected by the presentation of the DT.

            Figure 8 gives discrimination performance in Experiment 2 as a function of the relative position of the movement target, pooled over five subjects. It can be seen that also in this case, discrimination is superior when DT and MT refer to the same object. Accordingly, ANOVA yielded a significant effect of relative MT position (F(4,5)=4.42, p<0.01). A post-hoc Newman-Keuls test confirmed a significant difference of the condition DT=MT with respect to the other conditions (p<0.05). All other data points did not differ significantly.

 

 

Discussion

 

            The central question of this study was whether and how visual attention in the ventral stream (selection-for-perception) and selective reaching in the dorsal stream are coupled. The first experiment demonstrates that perceptual-based discrimination of a target (DT) during the preparation of a reaching movement is best when movement target (MT) and DT refer to the same object. When MT and DT do not coincide - even when there is, as in our experiments, a spatial segregation of just one degree between both - discrimination performance decreases to a considerably lower level. So, prior to the initiation of a reaching movement, in the movement programming phase, perceptual analysis is restricted to the movement target. During this processing phase, other objects are temporarily excluded from high-level visual (perceptual) analysis.

            In line with a recently developed model of visual attention (VAM, Schneider, 1995) we assume that this strict coupling between (dorsal-based) motor preparation and (ventral) perceptual analysis is due to a common attentional mechanism that selects (for both processing streams) one object at a time for further analysis. Which object is selected by the attentional mechanism selects depends on the instruction. In our experiments, the instruction requires to give priority to the reaching task, which should be performed as fast as possible, while the discrimination has rather the role of a secondary task. Consequently, during motor preparation the attentional mechanism will be locked to MT. In other words, selective dorsal processing for spatial-motor programming binds selective ventral processing for perception and discrimination.

            The second experiment showed that coupling between dorsal and ventral processing is still effective during movement execution. We do not want to claim that movement execution is always accompanied by a binding of the attentional mechanism at the movement target position. Attention should only be allocated to the future movement target when it is necessary to evaluate the success of the movement. In order to make this evaluation, it is necessary to process information about the actual movement endposition and to compare it with the intended movement position. This comparison should not be done preattentively. However, when a movement is highly practiced - this touches the issue of "automaticity" (see, Neumann, 1984; Shiffrin, 1988; Logan, xx; for overviews and issues) - and needs no "feedback" control, then attention to the results of the action execution is assumed to be not necessary. An example of such an action might be shifting gears during driving a car.       

            A implication of VAM is, if DT appears during the programming phase at a position that does not correspond to MT then discrimination of DT should hardly be possible. The attentional mechanism is engaged at MT which should temporarily prevent the processing of other objects such a DT. However, our data, show - in difference to the eye movement data by Deubel & Schneider (1996) - that discrimination performance in the case of non-correspondence between MT and DT is well above chance level. We conjecture that fast attentional shifts to DT prior to movement initiation that occur in some trials are the cause of this above chance performance level. The processing event in such trials might look like this: The color cue is processed and initiates an attentional shift to the future movement position and "programming" begins. Next, DT appears and the attentional mechanism is shifted towards it. DT is stored in visual short-term memory, and the attentional mechanism returns to MT in order to complete "programming". Consequently, in these trials with prior attention shifts to DT, the movement initiation is delayed later and the latency should be increased. This strategy predicts that trials with longer movement latencies would be accompanied with better discrimination performance - these should be the case of prior attention shifts. The data shown in figure 6 support this conclusion. The median split of discrimination performance based on the movement latencies showed that the "long latency" cases of the reaching movement are accompanied by better performance for non-correspondent positions of MT and DT compared to the "short latency" cases that made a prior attentional shift implausible. However, why is discrimination performance worse for the "long latency" case in the correspondence condition? We suggest that temporary disengagement (Posner et al., 1984) from MT might cause this performance drop. If the disengagement is triggered by the onset of DT but not by its specific location - which needs more time to be computed - then every time a shift is prepared and DT appears, disengagement happens no matter where DT appears. It takes some time for re-engagement at MT even when it is the same location. During disengagement, attention is withdrawn, and this causes a performance drop for DT processing.

            Based on VAM, such specification at the neuro-cognitive level should be briefly given. A typical experimental trial should consist of the following processing events. The color cue appears, and it is required that the visual attention mechanism is allocated to this cue. This means that the activation flow of low-level V1/V2 representations of the corresponding visual unit of the color cue to higher-level dorsal and ventral area is gated (e.g. increased) - see, also, LaBerge & Brown (1989) and van der Heijden (1992). As a consequence of this gated activation flow, color and arrow direction recognition are performed in high-level visual ventral areas (e.g IT). Based on this high-level visual information, the attention gating mechanism in V1/V2 is shifted to the location in the string indicated by the color cue information (MT). When the gating mechanism is locked on the corresponding V1/V2 representations of MT, movement, that is, reaching programming in the high-level dorsal areas (e.g. area 7b) begins. Motor programming within the framework of VAM means that the activation flow to high-level areas needs to have a certain duration before the neural pattern in these areas reaches a sufficient activation level. During this motor preparation phase, the discrimination target appears. As remarked above, two processing options are available. Either the attentional mechanism shifts towards DT, or it stays on MT. The first option, the shift, leads in case of non-correspondence of DT and MT processing of DT and its storage in short-term memory. When the mechanism returns to MT, programming is continued so the motor pattern can reach its desired activation level. Because motor programming, that is, activation flow to the corresponding high-level dorsal areas (e.g. 7b for reaching), can only occur as long as the attentional mechanism gates the activation flow, the shift to DT causes a delay in movement initiation.

            There is one further attentional theory besides VAM that deals dorsal spatial-motor programming, namely the premotor theory by Rizzolatti et al. (1987) and Rizzolatti et al. (1994). The central claim of this theory is that the control of "spatial attention" originates in the dorsal spatial-motor areas. Originally, only eye movement areas were suggested to control "spatial attention" (Rizzolatti et al, 1987). In the recent version, Rizzolatti et al. (1994, p. 240) specified the effect of spatial attention on ventral processing, by stating that "movement preparation facilitates the input side of pragmatic maps involved in the task, thus improving the stimulus detection." "Pragmatic maps" means high-level spatial-motor areas (e.g. area 7b). Two brief comments seems to us necessary. First, what the input side of these pragmatic maps are is not specified by Rizzolatti et al. (1994), so that no specific effect on ventral processing can be derived from the premotor theory. Consequently, and due to the emphasis on "spatial attention", the premotor does not predict a one-object-specific coupling. Second, Rizzolatti et al. (1985), and Rizzolatti et al. (1994) have claimed - based on data on different versions of neglect - that no single attentional mechanism exists (see, also, Allport, 1993). Instead, multiple visual-spatial attention centers / mechanisms are assumed. In contrast, VAM proposes - in line with the spirit of Posner & Petersen (1990) - that there is a single visual-spatial mechanism. This mechanism operates in early visual areas, and only one object at a time can be selected. According to VAM, and in difference to the premotor theory, it should not be possible to program a goal-directed saccade to one object, a arm-movement to another object, and simultaneously recognize a further different object. Such experiment is one of the projects we are currently working on and the data will decide whether VAM or the premotor theory is correct.

            Besides the proposed coupling between saccades, reaching and perceptual analysis, a further interesting prediction can be derived from VAM. Not only the location of a grasping target, but further parameters of the grasping programming process should control the attentional gating process in V1/V2 and therefore bind selective ventral stream. These further parameters are the size, orientation, and maybe the rough shape (relevant for the grip) of the to-be-grasped object. The gating mechanism in V1/V2 is assumed to mirror these parameters, size, orientation, and rough shape (and not just a "circular spotlight"). We are currently running such an experiment for testing this claim by using the same dual task paradigm as in this study. Size, orientation and shape of grasping and/or perceptual object are varied and the prediction is that correspondence in these parameters between the two object (given the same location) should lead to better perceptual performance.

            What attentional effects should be expected at the single cell level given the claim of selective processing of one-object-at-a-time is correct (VAM)? For the ventral stream, a study by Chelazzi et al. (1993) has shown that a goal-directed saccade to a target object surrounded by distractors leads to decrease in firing rate of IT neurons representing a distractor. IT neurons are assumed to compute the identity of objects based on visual shape (see, e.g., Oram & Perrett, 1994). Interestingly, the results show that prior the saccade initiation (90-120 msec before), the target neuron firing rate rises, while the distractor neuron firing rate begins to decline. Data that show the same firing rate differences in the dorsal stream with a target-distractor-configuration are currently missing (see, for dorsal attention single cell effects, Bushnell et al., 1981; Desimone & Duncan, 1995). A model such as VAM predicts that for dorsal brain areas such as LIP (eye-movement related neurons), or area 7b (arm-movement related neurons), the firing rate of the target and distractor neurons should diverge prior to the movement (in the programming phase). The target firing rate should rise while the distractor firing rate should decline. This divergence should be locked to a fixed period prior to the movement initiation. These data would be a fine complement to the behavioral data and the conclusion drawn in this study.

 

- amazingly high spatial precision of responses: due to feedback by LED

 

Overall, the data strongly argue for an obligatory and object-specific coupling of selective dorsal processing during the preparation of a reaching movement and selective ventral perceptual processing for object recognition. Therefore, the claim of VAM6 is supported, that as long as one of the two pathways carries out selective computations, e.g. program a reaching movement or saccade, selective computation in the other pathway, e.g. object recognition, is bound to the same object. This consideration predicts for the single cell level, that during the preparation of a spatial motor action, neurons in the dorsal pathways, e.g. "reaching cells" in area 7b, should firing with enhanced rate, while those recognition-related neurons in inferior-temporal lobe that represent the reaching target object (or an object at RT location) should also reveal a higher firing rate compared to cells, representing non-reaching-target-objects. Furthermore, these firing rate differences in ventral and dorsal processing should be temporarily coupled and occur at the same time slice after stimulus presentation.

 

 

Deubel, Shimojo & Paprotta (in preparation): same for line motion

 

Close coupling even more required for grasping where, more than just location, physical characteristics of objects such as size, shape and orientation have to be accounted for in movement preparation.

Possible solution: direct link of perceptual and motor codes, or even: common coding (Prinz)

-Evidence that the representations of specific actions that relate to an object can be activated by its visual presentation

 

Umilta (same volume)

Klatzky: associations between objects and actions carried out

AIP & F5: goal-directed hand movements

 

References

 

Figure captions

 

Figure 1: Experimental apparatus.

 

Figure 2: Stimulus sequence in Experiment 1. The trial starts with the presentation of a small fixation cross and two strings of characters left and right of the central fixation. The three central items of each letter string appear on ellipses of red (r), green (g) and blue (b) color. Initially, the subject positions his/her fingertip on the location of the central cross, the fingertip position is indicated by the arrowhead. After a delay of 1-1.6 sec, a symbolic cue in the form of a red, green or blue triangle appears in the center of the screen, pointing either to the right or to the left side; this cue specifies the movement target within the string. 150 msec later the premask characters change into nine distractors and one discrimination target (" " or " "). Target and distractors remain visible for 150 msec. Then, the characters and the central cue are removed and only the colored ellipses remain.

 

Figure 3: Time courses of manual reaching responses as measured with the Polhemus Fastrack system. The graph shows examples of reaching movements from one subject, and for the various movement target eccentricities.

 

Figure 4: Left: Mean movement amplitudes as a function of the movement target location. Vertical bars denote standard errors. Data are plotted separately for the cases where the discrimination stimulus was present at the central position on the right (open circles) and on the left (filled circles). Right: Movement durations.

 

Figure 5: Left: Mean movement onset latencies and standard errors as function of MT location. Data are given separately for the blocks where the discrimination target was on the right (open circles) and on the left (filled circles). Open triangles display the latency data from the "No discrimination - reaching only" single task control condition. Right: Histograms of the latency distribution, presented individually for the five subjects.

 

Figure 6: Left: Discrimination performance as a function of movement target location. Data for DT on the left and on the right are pooled such that the position of the discrimination target always refers to the position indicated in the graph at +7.65 deg. Vertical bars indicate standard errors. Horizontal line represents discrimination performance from the "No reaching - only discrimination" control trials. Right: Discrimination performance data after median split.

 

Figure 7: Same as figure 4, but for Experiment 2.

Figure 8: Discrimination performance as a function of movement target location in Experiment 2. Data for DT on the left and on the right are pooled such that the position of the discrimination target always refers to the position indicated in the graph at +7.65 deg. Vertical bars indicate standard errors.