Presented at IEA/AIE'94

MAVRIC's Brain

George E. Mobus

Computer Sciene Department, Western Washington University Bellingham, WA 98226

and

Paul S. Fisher

Department of Computer Science, University of North Texas Denton, TX 76205

email contact: mobus@cs.wwu.edu

Abstract

MAVRIC (Mobile Autonomous Vehicle for Research in Intelligent Control) is an embodied [4] Braitenberg vehicle [3] that is situated [4] in a nonstationary, dynamic environment. It is controlled fully by an artificial brain comprised solely of simulated Adaptrode-based neurons [10]. It learns to associate various environmental cues with mission-supportive or mission-threatening factors. It can then use those cues to seek or avoid objects. MAVRIC has roughly the intelligence of a moronic snail, but it has already yielded some insights into how greater intelligence might be built on top of lesser intelligent systems, thus recapitulating the evolution of intelligence in nature.

BACKGROUND

The authors have been engaged in a rather different approach to the investigation of intelligence in machines. We take our cue from the way in which biological intelligence seems to have arisen in nature, through evolution, from simple adaptive creatures to reasoning beings. We are convinced that the basis for emergent intelligent behavior must be explored, unashamedly in classical reductionist form, at the micro level. The core of our research effort is centered around a computational adaptive element which has already yielded results that give us confidence in this program. Based on a generalization of a basic biological model of adaptive response, the element which we call the Adaptrode, computes a multi-resolution temporal response signal to a sporadic and episodic input [10],[11].

In association with other adaptrodes and in the context of a neuron-like processing element, we have previously shown that multi-time scale associative learning is achieved [10],[11]. Furthermore, the adaptrode mechanism can easily enforce certain technical constraints on the temporal order of processed signals such that natural causal relations are encoded in the learned associations [10].

The research environment that we have chosen is autonomous robotics. MAVRIC (Mobile Autonomous Vehicle for Research in Intelligent Control) is an embodied [4] Braitenberg vehicle [3] that is situated [4] in a nonstationary, dynamic environment such as an ordinary room with movable and moving objects. MAVRIC is outfitted with a variety of sensors that allow it to sense the state of the environment (see Figure 1.) These sensors are simple light, touch, sound and (simulated) smell [10]. In the Figure, the cephalic (head) end (left side of figure) is arrayed with a variety of sensors, all of which produce proportional signals. An on-board computer polls the A/D board, controls the stepper motors and provides communications with the main computing platform, a 486-PC. Though a battery can be used to power MAVRIC (as shown), for long runs, a power cable is tethered along with the RS-422 cable. A program called ``BRAI N'' runs on the PC to simulate the neurons of MAVRIC's brain.

Fig. 1 MAVRIC (Mobile, Autonomous Vehicle for Research in Intelligent Control) is a classical Braitenberg vehicle.

MAVRIC can be characterized as a retarded (moronic, in fact) snail - roughly a Type 11 vehicle in Braitenberg's menagerie [3]. The objective of our research is to give MAVRIC sufficient intelligence that it can maneuver and find what we call mission-supportive events (and avoid mission-threatening events) in its environment even with such simple sensors. These events are analogous to survival events in the environment of a living snail - the availability of food and mates or the threats of predators or obnoxious chemicals. We call these events mission-relevant to emphasize the generalization of motivation to hypothetical robot missions such as finding ore samples on Mars or toxic wastes in a swamp.

MAVRIC's brain is based entirely on adaptrode neurons. No other computation is employed than the simulation of the neurons [10]. MAVRIC can now roam about and learn to associate various environmental cues such as the presence of a light coupled with sounding of a specific tone, with the occurrence of mission-relevant events, such as the occurrence of a sought object, where those cues have causal relations to the events. That is, the cue event(s) are linked to the mission-relevant event by some causal mechanism. The cue is a cue precisely because of this relationship. MAVRIC's job is to learn those associations so as to maximize its chances of finding and exploiting the events [10]. The challenge presented to the adaptrode learning mechanism is to encode associations which are, themselves, time varying or nonstationary without destroying previously learned associations --- "destructive interference" [14].

In this paper we report on the basic architecture of this artificial brain and discuss some of the early findings/insights we have made along with some comments on future directions we hope to take. A major result of the work with MAVRIC has been the discovery of a new search method that employs quasi-chaotic oscillation to generate novel search paths while constraining the search envelop. The resulting search improves the robot's chances of finding a mission-critical object when the general direction is known but the exact location is not. The search is reminiscent of foraging search by animals.

ARCHITECTURE OF A BRAIN

MAVRIC's brain consists of a number of modules comprised of adaptrode neurons. Each module performs some specific function. Modules are interconnected so that their combined functions result in robot behaviors. Which behavior is active at any given time depends on the levels of activation of all of the modules concurrently. Conflicts are resolved through cross inhibition. This is somewhat like the behavior-based, subsumption architecture advocated by Brooks [4]. When initially designed and constructed these modules were tested in an interactive simulation environment which used the same neuron simulation engine that is used to drive the robot brain. After testing the modules they were added to the basic brain network and tested ``in situ.'' Figure 2 shows a block diagram of the various modules in the brain.

Fig. 2 The MAVRIC brain is comprised of functional module networks. Sensory information enters modules on the left side of the figure. The core of the brain is involved with learning associations between non-meaningful and meaningful stimuli . Motor output control is accomplished by modules on the right side of the figure. The rectangles are interfaces with the computing environment.

MOTOR CONTROL OUTPUT

The robot's basic actions include driving forward (at variable speeds), turning right or left, backing up and stopping. Through sequences of these actions it has a repertoire of behaviors such as searching, following a gradient (homing in on an object or seeking), avoiding an object and fleeing. MAVRIC uses two independently driven stepper motors for propulsion. As in the classical Braitenberg vehicle, with a front-end castor, this gives direction and speed control.

We have developed an interesting oscillator network (Figure 3) which generates a pair of signals one of which is fed to the right motor and the other to the left motor. The combined effect of these signals produces a stochastic sinusoidal weave in the forward motion of the robot -- a sort of drunken walk. Attractor reconstruction analysis of the net value of the signals (left output - right output) shows a strange attractor basin indicating that the signal is at least quasi-chaotic.

When not being stimulated by any meaningful sensory inputs, MAVRIC will weave in a constrained, yet novel, sinusoidal path as it searches its environment. The way in which MAVRIC appears to be following an ordered but stochastic search, reminiscent of a bloodhound weaving back and forth trying to pick up a scent, has led us to investigate the theory of foraging search as an alternative to stochastic methods. Foraging search in animals is a means by which the animal can find dynamically and sparsely distributed resources in a huge space [15],[13],[6]. One component of such a search is that lacking any clues as to the whereabouts of a patch of resource, the animal conducts a stochastic, but not random walk, path selection procedure[note 1] which ensures that it will explore a wider corridor than it would have by moving in a strictly straight line. This component is important in a dynamic environment where resources are either moving or coming into and out of existence (e.g., ripening of fruits in season) in a sporadic and episodic fashion [9]. Another component of foraging search --- learning cues --- will be discussed below.

The oscillator circuit was employed because it behaves similarly to central pattern generator (CPG) circuits used by many animals for motor control. The major advantage of these circuits is that input signals to different points in the circuit can modulate or shape the output to achieve changes in direction and/or speed. A ``Go Straight'' signal damps the oscillation amplitude so that MAVRIC moves straight ahead. A ``Go Right'' signal damps the right output signal so that the left motor runs faster than the right motor, thus turning the robot to the right. This feature is exploited to make the robot home in on cues that it detects in the course of searching. Once a recognized signal is detected, MAVRIC ceases its quasi-chaotic weave and moves in a relatively straight line toward the signal source using one of two methods to be discussed below.

Fig. 3 The Central Pattern Generator (CPG) motor control circuit is comprised of four "core" neurons that generate a quasi-chaotic sinusoidal signal (left motor minus right motor). Additional neurons provide signal distribution for externally applied control signals that are used to modulate and shape the output signals. Pointed arrows indicate signals from/to the external environment. Flat terminals indicate excitatory connections while circular terminals indicate inhibitory ones.

SENSORY INPUT

There are four basic sensory modalities employed in MAVRIC. The most primitive sensory modality in nature is olfaction or chemical sensing [8]. Animals have the ability to sense and follow chemical gradients (either toward an attractant or away from a repellant). This involves computing a temporal difference in the concentration of the chemical and generating a "GO STRAIGHT" signal if the difference is large and relaxing to a stochastic weave if the difference is small [8]. In MAVRIC, this sense was originally simulated using heat sensors. A "sniffer" neural circuit provides a temporal difference in the temperature using a two-cell oscillator (unlike the four-cell motor circuit, this oscillator follows a limit cycle). When MAVRIC enters a detectable gradient, it can orient itself in the direction of the source (either toward or away depending on the source being an attractant or repellant).

Heat, it turned out, was not a good choice due to problems with sensitivity and convection currents. Subsequently, the heat sensors were replaced by a microphone and narrow bandpass filter (see description of auditory system below). A gradient is simulated by controlling the volume of a tone emitted by a speaker located in the environment. If MAVRIC gets closer to the speaker (the object being sought) the volume is increased by an amount that can be detected by the on-board amplifier and a difference computed by the sniffer circuit. Similarly, if MAVRIC moves away from the speaker, the volume is decreased. Unfortunately, this alternative, while working well, has proved somewhat problematic in its own right. Due to the poor sensitivity of the amplifying circuits on-board (a reflection of budgetary constraints), the volume needed to represent MAVRIC reaching the object was distracting to the human observers. We had selected a pitch well within human hearing. Perhaps a pitch above 20k Hz. would have been more appropriate.

Four light sensors, arrayed at divergent angles across the "cephalic" end constitute the vision system in classic Braitenberg fashion. Light is used as one of the conditionable stimulus sources [10]. MAVRIC can sense the direction toward a brighter-than-background light source through a spatial difference computation. As a conditionable stimulus, light neither attracts nor repels MAVRIC unless it has learned to associate light with seeking or avoidance reaction (see next section ).

Two semi-directional microphones and a set of narrow bandpass filters provide MAVRIC with some primitive auditory sense. At present, only two tones are used. As with light, these tones are conditionable with respect to meaningful stimuli. Tone 'A' might be used to signal the presence of a mission-critical object while tone 'B' could signal a threat. MAVRIC's ability to sense direction with these tones is limited, so they are used as secondary association cues. When detected, MAVRIC still needs to search for the object using light and smell.

The last sense that MAVRIC uses is touch. It is outfitted with two compliant "feelers" that originate near the center line of the cephalus and wrap outward and around to the sides. These feelers generate a proportional signal when MAVRIC touches something. The signal is quickly strong if MAVRIC runs into an object head--on (at an admittedly snail's pace!), and is slow and weak if it brushes past an object. This sense allows MAVRIC to decide to back up or turn tangentially depending on the circumstances.

All of the sensory inputs are proportional. Signals are not conditioned or linearized. They are digitized by an 8-bit A/D converter and sampled at the rate of 10 times per second.

ASSOCIATION AND LEARNING

The objective of MAVRIC is to show how a robot can learn by on-going experience to survive and accomplish a given mission in a nonstationary, dynamic environment. To do this, MAVRIC must learn, across multiple time scales, the causal associations between events that will impact its survival/mission and those sensory cues which precede the meaningful event, and hence, predict its occurrence [10]. The associator network in MAVRIC is essentially the same as the BAN (Basic Associator Network) reported in [10] and shown here in Figure 4}. Adaptrode neurons using a local interaction rule of associative learning [2] encode the short-term, intermediate-term and long-term associations between conditioned stimuli (CSs, e.g., light in the figure) and either pain or pleasure as shown in the figure. Additionally, MAVRIC can associate a CS with either an attractant or a repellant odor. Secondary conditioning phenomena have also been achieved where more than one modality (in this case light and sound) are active in sequence.

The associator network decides whether a stimulus is benign, desirable or harmful based on past experience. It then issues the appropriate signals to either the SEEK or the AVOID controller if the decision is desirable or harmful, respectively. If the stimulus is benign (not associated) then neither SEEK or AVOID is stimulated and the robot simply wanders in its "drunken walk" weave searching for something of interest.

The pain and pleasure inputs (as well as the satiation input of Figure 2) are currently implemented in software under the control of the experimenter who must decide whether some particular experimental event is to be harmful or pleasurable. MAVRIC does not now have pain or satiation sensors as such.

Fig. 4 A simplified sensor -- behavior associator network. Rectangles represent sensor inputs. The object detector in this figure is a simple light detector. When any of the sensors signal the presence of a light, that fact is associated either with pain or pleasure. The robot learns to avoid or seek the light, respectively.

SOME EARLY RESULTS

As of this writing, we have not been able to run all of the modules shown in Figure 4 simultaneously and get all of the hardware to work at the same time! This reflects the shoestring budget that MAVRIC was built on! Additionally, experimental runs are excruciatingly slow. We were lucky to get one complete run in a week's worth of work[note 2]. In spite of many technical difficulties the early results from numerous runs of workable components of the system are quite encouraging. All of the modules have been tested in some combination. For example, the light sensing-association-motor output system has been run and shows reliably that MAVRIC can learn light/pain and/or light pleasure associations (as reported in [10]). We have shown that the robot can learn contrary associations so long as they seperate in time. For example, MAVRIC can be "trained" that light means reward over a long span of time, sufficient that the association is encoded in a long-term memory trace. Any time MAVRIC detects a light source, it will approach the light. Subsequently, MAVRIC can be shown a light, that, when approached results in pain. This is contrary to its training (experience) and results in an avoidance (flee) reaction. If the light--pain combination is repeated a few more times, MAVRIC will form a short-term association that causes it to avoid the light spontaneously. The "new" situation, light--pain, is not repeated thereafter, it may have represented a transient situation in the scheme of things. MAVRIC will, for a time continue to avoid light. This is followed by a period of seeming ambivilence, first it will move away from a light, but then turn toward it. Eventually, MAVRIC will once again approach the light, albeit slowly. The older association reemerges from the background of long-term memory. Subsequently, if MAVRIC receives a reward after approaching the light, it will strengthen its longer-term association such that it approaches the light with the same "verve" that it had before the transient association had occurred. This non-destruction of contrary memories is an important factor in machine learning.

The gradient following "sniffer" circuit has been used to get MAVRIC to detect and follow a gradient. We have used a specified tone to represent a "genetically" hard-wired scent of a reward (e.g., food). The range of this scent is kept limited to a relatively small area around the mission-critical object. MAVRIC is set loose at one end of the research lab and the object is placed in an arbitrary location in the lab. MAVRIC starts its quasi-chaotic forage (paper) as described above. If MAVRIC is "lucky" its path will wind through the area of the scent gradient, in which case MAVRIC changes behavior and follows the gradient toward the source. This form of foraging is successful only if the mission-critical resource is sufficiently dense in distribution or MAVRIC is given a sufficiently long period in which to search.

A better solution is to allow MAVRIC to learn cue events that are causally associated with the occurrence of the mission-critical resource, but that have an inherently larger radius of detection, like light or sound. As per the description of the associator circuit above, MAVRIC can, in fact, learn to associate the occurrence of a light source with that of a resource. Since the light can be detected from a much larger distance, MAVRIC can, upon detection of the light, home in on the light source which brings it within the range of the scent gradient. It then follows the latter to the object it was seeking. The experimental layout for this behavior is depicted in Figure 5. MAVRIC starts its search from a "HOME" position in the lab. The point labelled "Mission event" marks the presence of food. The rings radiating outward from that point represent the "odor" gradient. Several typical search paths are shown along with the estimated envelop ("Search Limits") obtained from a number of search iterations. Note the "drunken weave" walk taken by the robot in its search. This is the pattern that is generated by the chaotic oscillator network. The weaving pattern is not a random walk, yet it does produce a novel path for each successive search. MAVRIC's success at finding food is dependent on the radius of the detectible gradient, relative to the surface area swept out by the search limits.

In the figure the irregularly shaped figure-8 trajectory with arrowheads actually represents a projection of a classical deterministic chaotic attractor onto the plane (we used two dimensions of the Lorenz attractor). Using uniformly distributed time intervals, the occurrence and duration of mission events is governed by this trajectory. Thus we create an environment which is nonstationary stochastic but not random. Also shown is a second event/gradient object that is chosen to be causally linked with the occurrence of the mission event. This event (e.g., the light source) and its gradient is called the "Cue event" and it will be generated according to another uniformly distributed probability relative to the occurrence of the mission event. The "trick" of intelligence here is that the gradient of the cue event can be sensed at a much greater distance than can the mission gradient. For example the point source of bright light will be sensed over a larger percentage of MAVRIC's range. It then turns out that if the cue event has the correct causal relation with the mission event, MAVRIC will learn this association through experiential encounters and begin to follow the light gradient as if it were the odor gradient. It will, however, preferentially follow the odor gradient once it becomes detected. In this way MAVRIC learns to use environmental cues to improve its foraging performance.

Fig. 5 The experimental layout for a MAVRIC run. A mission resource is centered in the darker circular area, representing the scent gradient. A light (center of dashed circles) gradient is placed nearby. The creation and position of the resource are determined by the chaotic dynamics projected onto the lab floor (solid lines with arrow points). MAVRIC learns to find the resource by associating the existence of the light with the scent gradient and following the former as if it were the latter.

DISCUSSION, CONCLUSIONS AND FUTURE DIRECTIONS

Considering the shoestring budget under which the MAVRIC platform has been developed we are pleased with the progress of this exploratory work to date. Several animatic functions have emerged from the approach taken here. We have good reason to believe that pursuing this approach will lead to a better understanding of machine intelligence methods and to practical mobile, autonomous robots capable of survival in unknown environments.

A moronic snail brain is seemingly not in the mainstream of AI research. As far as anyone can ascertain snails do not reason or recognize faces or remember stories. Is it possible that studying such a "primitive" system can lead to insights that may help us to one day to build machines that do have cognitive skills? From the biological side we are encouraged. Alkon [1] reports that some of the same cellular mechanisms involved in memory traces in the brains of real snails can be found in the neurons of mammalian brains. In the AI/Robotics camp we are not alone in our philosophical motivations [4].

But what have we garnered from MAVRIC that leads us to believe this track is worth pursuing? There are several important indications that the "bottom--up" development of intelligent, autonomous agents is realizable.

Many AI and connectionist approaches to pattern recognition are based on learning patterns or pattern classifications with little or no context. That is, the learned patterns have no meaning for the system, they are just facts. In MAVRIC, only the most crude sense of a pattern is learned, the association between tones and lights, for example. But those associations are always learned in the context of what is good or bad for the robot. Thus, every pattern has meaning and hence provides motivation for behavior. Basic reactive behaviors [4],[7] respond to stimuli that are prewired to represent harmful events or supportive events. This is analogous to an animal's genetically-governed wiring for pain or pleasure. What the robot learns is the causal relationship between events which, themselves are innocuous (nonsemantic) and those that carry meaning in the sense of the prewired responses. The temporal ordering of causal relationships thus allow the robot to respond to learned stimuli as if they were the hardwired semantic stimuli. We believe this is more to the point of symbol grounding [5]. Symbols (patterns) come to be recognized for their ultimate relation to meaningful stimuli rather than just to sensory grounding. Coupled with secondary conditioning [10] we have the potential basis for abstraction of conditionable sensory stimuli. It is this potential that gives us confidence that this approach will lead to higher levels of intelligence.

The second insight could be phrased as: More behaviors = more sensors + more intelligence. Many of the reactive behaviors (e.g., SEEK and AVOID) are mutually exclusive, while others are priority based, somewhat along the lines of the subsumptive architectures of [4]. Just as there are nonsemantic sensory modalities at the input end, we have found it beneficial to add nonreactive behaviors to the output. As we have added these behaviors (e.g., quasi-chaotic search) along with nonsemantic sensors, the associator network and support networks such as the "Object Location" network of Figure 2 have likewise grown. In fact it appears that as the number of sensor modalities/actions increases linearly we may expect to see a quadratic increase in the number of neurons needed to associatively process the data and produce a corresponding new behavior. There really isn't anything surprising here from the perspective of biology. As we go up the phylogenetic scale we see a substantial increase in the size of the cerebral lobes compared to the brain stem as animals get "smarter". What has been somewhat surprising is the way in which we can add behaviors so easily without destroying the basic reactive functions of the robot. This suggests that evolutionary processes such as module-doubling, accretion and specialization may be applicable in the development of useful (read selected) robots as it seems to be in nature.

There are severe limitations to the amount of processing that can be done with our present platform. However there are still a few areas we wish to explore that are within the realm of the current system. One of these is the role of nonassociative learning such as habituation and sensitization in the shaping of behavior. We suspect, based on the biological counterparts, that these adaptive responses will play an important part in increasing the intelligence of MAVRIC.

In order to incorporate more and more complex sensors, such as CCD arrays for image-like vision, we will clearly need to increase the processing horsepower of our brain hardware. Our goal is to move to a parallel processing environment such as the Transputer which seems ideally suited to running our simulated brain. Recently a very large commercial research lab of one of the world's leading computer companies donated a quadputer board and software to our lab to help us get to the next stage.

More recently one of us (Mobus) has been exploring a software--based agent called a cyberbot that is meant to forage for resources in an extended, distributed computing environment such as the Internet. The cyberbot is the software equivalent of a robot. It is fully embodied in terms of detecting objects in its environment (e.g., file names) and executing behaviors (e.g., jumping to the next node in a network). It is fully situated in a real, vast, dynamic and nonstationary environment --- the phenomenal growth rate of the Internet is now legend. The cyberbot vehicle, therefore, is not a mere simulation just because it is in software. Rather we suspect it will prove every bit as interesting as a physical robot for exploring concepts in foraging behavior and the animatic route to higher intelligence.

Footnotes

Try watching a scout ant as it searches for food. The ant will generally weave back and forth while moving in a general forward direction. [back to text]
While some may argue that this is a good case for doing pure simulations, we maintain that the results reported here would not have been possible in a purely simulated world. We continue to agree with Brooks [4] on the issue of situatedness as a necessary condition for studying intelligence in agents. [back to text]

References

Daniel L. Alkon. Memory Traces in the Brain. Cambridge University Press, Cambridge, 1987.
D.L. Alkon, K.T. Blackwell, G.S. Barbour, A.K. Rigler, and T.P. Vogl. Pattern recognition by an artificial network derived from biological neuronal systems. Biological Cybernetics, 62:363 -- 376, 1990.
Valentino Braitenberg. Vehicles: Experiments in Synthetic Psychology. The MIT Press, 1984.
Rodney A. Brooks. Intelligence without reasoning. Technical Report A.I. Memo No. 1293, MIT Artificial Intelligence Laboratory, 1991.
S. Harnad. The symbol grounding problem. Physica D, 42:335 -- 346, 1990.
Alejandro Kacelnic, John R. Krebs, and Bruno Ens. Foraging in a changing environment: an experiment with starlings (sturnus vulgaris). In Michael L. Commons, Alejandro Kacelnik, and Sara J. Shettleworth, editors, Quantitative Analysis of Behavior: Foraging, chapter 4, pages 63 -- 88. Lawerence Erlbaum Associates, Hillsdale, NJ, 1987.
Leslie Pack Kaelbling. Foundations of learning in autonomous agents. In Walter Van de Velde, editor, Toward Learning Robots, pages 131--144. The MIT Press, 1993.
Daniel E. Koshland. Bacterial Chemotaxis as a Model Behavioral System. Raven Press, New York, 1980.
Katharine Milton. Diet and primate evolution. Scientific American, 269(2):86 -- 93, 1993.
George E. Mobus. A Multi-time scale learning mechanism for neuromimic processing. PhD thesis, University of North Texas, 1994. Unpublished.
George E. Mobus and Paul S. Fisher. An adaptive controller using an adaptrode-based artificial neural network. Technical Report CRPDC-90-6, Center for Research in Parallel and Distributed Computing, University of North Texas, Denton, TX, 1990.
George E. Mobus and Paul S. Fisher. A mobile autonomous robot for research in intelligent control. Technical Report CRPDC-93-12, Center for Research in Parallel and Distributed Computing, University of North Texas, Denton, TX, 1993.
David S. Olton, Gail E. Handlemann, and John A. Walker. Spatial memory and food searching strategies. In Alan C. Kamil and Theodore D. Sargent, editors, Foraging Behavior, chapter 15, pages 333 -- 354. Garland STPM Press, New York, 1981.
Richard S. Sutton and Steven D. Whitehead. Online learning with random representations. In Proceedings of the Tenth International Conference on Machine Learning, pages 314 -- 321. Morgan Kaufmann, 1993.
Kieth D. Waddington and Bernd Heinrich. Patterns of movement and floral choice by foraging bees. In Alan C. Kamil and Theodore D. Sargent, editors, Foraging Behavior, chapter 10, pages 215 -- 230. Garland STPM Press, New York, 1981.