The architecture that Braitenberg described is comprised of simple sensors, processing elements (for what he called logic), unidirectional wires to connect elements, and motors to move the animats. Figure 1 shows two of the simplest vehicles, one labelled 'Hate', the other 'Love'. These two vehicles are composed of the same elements but differ in the fact that the wires cross in 'Love'. It is fairly easy to see how an excitatory signal (set up by the activity in the photodetectors - yellow 'eyes') could activate the opposite motor (red) to cause it to go faster than the proximal motor. When the vehicle is directly facing the light source, both sensors are equally active and both motors are running at the same speed. The vehicle will first turn toward the light and then go faster until it crashes into the light [actually Braitenberg called this one hate, if I recall correctly, because it would ram the light aggressively! I chose the nomenclature to reflect the notions of avoidance and attraction that will play a prominent role in MAVRIC.]
A number of more elaborate vehicles were described by Braitenberg, including some in which the wires have special memory and adaptive properties. Wire connections may be inhibitory as well as excitatory. Neurons can interact with each other, forming associative networks between sensory perception and motor output. What is important to note about this approach is the fact that behavior of the vehicle is not programmed into it explicitly. In fact, for more complex wiring schemes the behavior is not even, necessarily implicitly present. Rather, behavior emerges from interactions between internal elements through both internal connections (if present) and through interaction with the external environment. If memory is involved (Braitenberg's Ergotrix and Mnemotrix wires) these behaviors can become quite complex in both space and time. Indeed, it can be argued that the actual behavior expressed by a sufficiently complex vehicle, having associative and causal memory, is not even predictable, simply from knowledge of the wiring diagram.
The adaptrode fills the role of both a Mnemotrix and Ergotrix wire in Braitenberg's world. The memory trace encoding ability of the adaptrode provides the same functionality as Braitenberg claimed for Mnemotrix wire, which has a high initial resistance to the flow of current, but lowers its resistance according to the Hebb rule for associative encoding. An Ergotrix wire is a little more complicated. It will change its resistance only if the source neuron is excited before the sink neuron. The Ergotrix wire encodes a temporally ordered correlation (causal correlation) giving rise to the possibility that the sink neuron could become a predictor of the activation of the source neuron. This is exactly the effect given by the associative adaptrode, which enforces a strict temporal ordering in encoding memory traces. Employing adaptrode based neurons in an EBA significantly extends the Braitenberg model.
Figure 3, below, shows an overview of the MAVRIC EBA. This figure shows a number of different kinds of elements and their relationships. Detailed breakdown of each of several subsystems are given in detail below. The fundamental design of MAVRIC is that it is motivated to obtain (by search) energy resources. In our experimental setup, we simulate such a resource with a specific tone sounded when MAVRIC is in contact with the source object. As long as the tone is sounding, MAVRIC is feeding and accumulating food in its "stomach" (not shown). The ingestion of food (integration of the tone amplitude over time) accumulates in the Food input slot, where it provides intermediate-term evaluative feedback (reward) to the Seek neuron. The Digestion process converts food into energy over a longer time scale and provides longer-term evaluative (confirmation) feedback to the Seek neuron.
In general, sensory and pre-perceptual data is made available through input slots (inslots), most of which are on the left-hand side of the figure. Eight-bit greyscale data is recorded each 100 msec slice and converted to a value between 0 and 1 for processing in the neural network. Processing proceeds from left to right. Some perceptual information, such as where, relative to the centerline of the robot, the object is located, is used directly by the action selection network. Two main associative neurons modulate which action is selected based on whether the robot has learned to seek or avoid an object.
Below we examine each of the four subsystems of this architecture and provide more detailed explanation of the functions.
The four separate tones, simulating odorants for a sense of "smell", are presented directly to the network. This information cannot be directionally determined from the single "nose" microphone. In the current version, odor is either present or absent. In a future version we are intending to compute derived direction from changing gradient information. In this model, odor 0 represents a "food" odor a priori, hard wired into the system. Similarly, odor 3 represents "poison". Both of these tones/odors drive specific neurons which interact with the external body functions. For example the sounding of tone 0 at the same time that MAVRIC is touching some object directly ahead constitutes an episode of feeding. The touching neuron (2), which normally signals an undesirable situation, from which MAVRIC would try to escape or at least turn away, is inhibited during feeding. The feeding neuron (16) requires both tone 0 and touch ahead (neuron 1) to become excited, but then fires at a rate proportional to the volume of the tone.
Pain is activated by either touching or poison, above a certain level of activity in either of these. Pain is used to reinforce the Avoid neuron (see below) as the "punishment" signal. The Pain neuron (17) integrates pain-causing conditions, such as the presence of poison and is available through an outslot (see figure 7 below). This value is transfered by the Disruption body function to a Pain inslot (9) for the purpose of being available for the reinforcement signal.
Avoidance behavior is driven either by Odor 3 (poison) or by light touching (that isn't feeding). Intermediate-term evaluative feedback comes from the perception of pain, while longer-term feedback comes from the accumulation of tissue damage (see Disruption task below).
Note in figure 5 that this circuit is not symetrical. The activation of avoidance will inhibit seeking behavior. This is what required the Touching neuron to be inhibited by the presence of Odor 0 (food). Avoidance has priority over seeking behavior as a result.
The figure is largely self explanatory. The arrows on the right lead first to outslots which are mapped onto the appropriate control variables in the Wander task. These signals modulate the oscillatory output of Wander so that the robot tends more strongly in the indicated direction. Slow (14) and Fast (15) modulate the base speed of Wander. A sufficiently strong, extended activation of slow will stop MAVRIC (as for feeding).
Wander is the main motor reflex in MAVRIC. Wander causes MAVRIC to move in a drunken sailor walk unless it is being modulated by signals from one of the outslots shown (see Mobus & Fisher, 1994).
Escape is a programmed behavior activated by the Avoid Ahead neuron (10). It involves an inhibition of Wander along with a reversing of the wheels, a characteristic turn of about 180 degrees and a short run forward. After the execution of this behavior MAVRIC goes back to wandering.
MR and ML stand for Motor Right and Motor Left respectively. These tasks simply translate the real values from the outputs of Wander (and Escape) into appropriate form for the robot's motor commands.
Digestion's main job is to remove "food" from the "stomach" while increasing the stored energy available to MAVRIC. The use of these variables as evaluative feedback to the Seek neuron has already been covered above. Digestion operates over a much longer time scale than the feeding or food accumulation actions. An increase in energy means that MAVRIC was successful in finding resources. So the energy value is used to reinforce the learning of feature associations which led to successful feeding.
Disruption's task is analogous to Digestion's, except that it operates
over a much shorter time scale. Punishment and Damage signals provide
very rapid feedback to the Avoid neuron. The learning of a painful
lesson is much quicker than that of learning a positive lessson!
Mobus, G.E. and Paul S. Fisher, "Foraging Search at the Edge of Chaos", Presented at Metroplex Institute of Neurodynamics Conferenceon Oscillations in Neural Networks, May, 1994. This paper appears as an invited chapter (16) in D.S. Levine, V. R. Brown and V. T. Shirey (Eds.), Oscillations in Neural Systems, Lawrence Erlbaum Associates, Publishers, Mahwah, New Jersey. Available in HTML version [221k including graphics]
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.