This paper appears as chapter 16 in D. Levine, V. R. Brown and V. T.
Shirey (eds), Oscillations
in Neural Networks, Lawerence Erlbaum and Associates, Publishers
©, 1999. Lawerence Erlbaum & Associates, Mahwah, NJ. 1999.
Foraging Search at the Edge of Chaos
George E. Mobus
Western Washington University
Paul S. Fisher
University of North Texas
Many animal species are faced with the problem of finding sparsely distributed
resources that occur (and disappear) dynamically in a huge space. Furthermore
the dynamics of these resources are stochastic and probably chaotic in
terms of both spatial and temporal distribution. But they are not completely
random. The search strategy is called foraging. It involves two interrelated
phases which dominate the behavior of the animal based on the amount of
knowledge it has regarding the location and timing of the resource. In
the absence of knowledge or cues foraging animals adopt a stochastic search
pattern that will have a reasonably high likelihood of bringing them into
encounters with the resource. With knowledge or cues, the animal switches
to a more directed search behavior.
Autonomous agents such as mobile robots may need to have these capabilities
in order to find mission-critical objects yet no current algorithmic or
heuristic search method adequately addresses this problem. The serendipitous
discovery of a quasi-chaotic oscillating neural circuit used to generate
motor signals in a mobile robot has led to the development of an autonomous
agent search method that resembles foraging search in a number of details.
An oscillator circuit, based on the concept of a central pattern
generator (CPG) in biology, is described qualitatively. Its role in controlling
the motion of a mobile robot and the effects it has on search efficiency
are presented. Constraining search to potentially fruitful paths before
any useful heuristics are available to the searcher is a problem of general
interest in artificial intelligence. Foraging search based on chaotic oscillators
may prove useful in a more general way.
1. INTRODUCTION
Quite typically, mobile robots that search for objects in their environment
are programmed to scan the vicinity and, if they spot what they are seeking,
plan a movement path to the object. If they do not sense their target object,
they plan a movement path that will position them at a new observation
point. Most often, these programs seek an optimal path between the two
points. These approaches are computationally expensive. In stark contrast,
many animals, when searching for food or other resources, exhibit seemingly
non-optimal movements. They can be seen to weave and wander, sometimes
erratically, through their search space. Animals are, however, generally
successful in finding the resources they seek in spite of this seeming
non-optimal pattern of search behavior[1]. Many
animal species forage for critical survival resources such as food, water
and shelter, as well as what might be called mission resources such as
mates or nesting material.
Foraging is a search strategy that is employed when the subject knows
what object is being sought but doesn't necessarily know where, in a usually
vast space, to look. Further characteristics that make this search problem
of general interest is that the resources often are sparsely distributed
and may have chaotic dynamics with respect to their location, duration
and timing. Foraging for resources in a chaotic [2] world
may be the quintessential task for any autonomous
agent, natural or artificial. We have been developing an artificial agent
that is strongly inspired by biological models, an embodied (Brooks, 1991)
Braitenberg (1984) vehicle, which is designed to learn how to perform a
basic search mission in a non-stationary environment. The 'robot' is called
MAVRIC (Mobile, Autonomous Vehicle for Research
in Intelligent Control) (Mobus & Fisher, 1993; 1994).
The motivation for this objective comes directly from the observation that
the 'real' real world is open from the point of view of any realizable
agent. Extending Brooks' (1991) notion of situatedness as a precondition
for the successful demonstration of intelligence in artificial systems,
we believe that ongoing real-time adaptation is an open-ended process that
must extend over the life of an agent. In other words, we do not subscribe
to the notion that artificial agents will be trained on some fixed set
of patterns (even if only trained on some subset of the universe of patterns)
and then be ready to survive in the real world. Things change too much
even in environments that we know fairly well. Or in worlds we have not
even adequately explored, things may be too different from our expectations
to allow us to choose adequate training sets.
In foraging, animals may start with little or no knowledge of the organization
of the environment. That is, they are not familiar with cue-resource relationships
(situational heuristics) initially[3]. As
they conduct their unguided search and then encounter the sought resource,
they learn which cues (or landmarks) can be reliably counted on to lead
to the sought resource and so they tend to improve their performance on
subsequent searches. As they gain knowledge of these situational heuristics,
they can use them to guide their searches. Should conditions change and
the learned cue-resource relations no longer hold, animals revert to their
uninformed search strategy and begin the process of learning the new cues.
Our work centered on a neuromimic adaptive processor that allows an agent
to learn the cue-resource relationships in a non-stationary environment
such as this. The adaptive element of the neuromime, taking the place of
a traditional synaptic weighting function, is called an
Adaptrode (Mobus, 1994a; 1994b).
An adaptrode models the multiple time-scale memory trace mechanisms
operative in the post-synaptic compartment of neurons (Alkon, 1987). Mathematically
it is a cascaded, modified exponential smoothing function for pulse-coded
primary input signals. Each level in the adaptrode corresponds to a longer
time scale so that the adaptrode captures moving average activity over
short-, intermediate- and long-term periods. The smoothing functions at
each level are gated into the next longer time scale by the receipt of
a second, temporally-ordered[4], correlated
signal to achieve associative learning. The adaptrode response to primary
input (specifically the cue signal) is, thus, dependent on the history
of inputs with higher weight given to more recent activity and longer-term
weighting dependent on correlation with other signals (specifically resource
detection signals that come some short time after excitation of the cue
signal). Adaptrode response to input is effectively an estimate of the
salience of the current input. It takes the place of the traditional weighted
input signal in conventional neural network models. Thus weights in an
adaptrode-based neuron are dynamic. They forget to the extent that recent
inputs are not repeated and reinforced over time. As such they enable life-long
learning networks. In previous work, Mobus (1994b) has shown that a simple
two-cell network of competitive, adaptrode-based neurons can learn orthogonal
associations so long as they separate in time domains. This is one solution
to the destructive interference problem known to plague many conventional
neural networks.
In MAVRIC, we wanted the robot to learn situational heuristics in which
the detection of a cue, which is causally related to a resource event,
could be used as a predictor of the location and timing of the event. Resource-cue
associations in natural environments abound. Color, shape of plants, existence
of a watering hole; all of these are exploited by numerous animals as cues
to the potential presence of food. In our lab we arranged that light sources
would be associated with (physically near) various resource objects. MAVRIC's
task was to learn to find these resources by first learning the association
and thereafter using the occurrence of a light as a predictor of the occurrence
of the resource. Adaptrode-based neurons learn these relations in real-time
and on-line. All that would be needed would be a way to get MAVRIC to experience
the association a sufficient number of times so that the learning could
take place.
We faced an early problem in terms of how to get the process started.
It turns out that the strategy of motion of the agent before it has learned
any useful relationships can have a significant impact on how quickly it
is exposed to instances of successfully finding a resource and, consequently,
learning the associated cues. MAVRIC, in classic Braitenberg style, uses
fixed, semi-directional sensors. As a result, simply having MAVRIC move
along in a straight line would severely restrict its ability to sample
the space and, unless a resource (and its associated cues) happened to
occur directly in its path, it would never find a resource, let alone learn
cues that would lead to resources.
As an obvious starting point then, we provided MAVRIC with a random
walk motor program in order to introduce some "novelty" into the search
path. In this approach MAVRIC would move for a randomly chosen interval
and then turn in a randomly chosen direction a randomly chosen number of
degrees. It is possible to show that a random walk search is guaranteed
to cover the entire space, but it may take an infinite amount of time to
do so. From the standpoint of increasing the number of successful finds
and subsequent learning, this method would work if we didn't care how long
it took MAVRIC to find resources. The adaptrode learning mechanism is very
fast at encoding associations (Mobus, 1994a; 1994b), so that after five
or six encounters MAVRIC became quite good at finding the resource based
on following cues. However, it might take MAVRIC several days to even locate
the first such encounter by chance, let alone the five needed to get it
to start following cues. Under those circumstances we followed a program
of "training" MAVRIC on cue-resource relationships in which we placed the
objects sufficiently close to MAVRIC so that it was guaranteed at least
five encounters in a short time (Mobus & Fisher, 1991a; 1991b). Thereafter,
the robot would reliably follow the learned cues if they were detected.
However, this was not a "good" solution for two reasons. First, the
required training was not the same as allowing MAVRIC to learn from naturally
occurring experience. We were trying to achieve the same level of autonomy
as animals possess in the wild. Every time conditions changed we would
have to retrain the robot. This initial approach resembled conditioned
learning experiments conducted in animal learning (e.g., classical or Pavlovian
conditioning) (Mackintosh, 1983), but this was not what we wanted.
Second, unless the density of resources spread throughout the environment
were sufficiently high, the random walk search might still cause MAVRIC
to fail to find that for which it was looking for long periods. Animals
seeking food in the wild are under time and energy constraints that necessitate
finding some instances of food every so often. Evolution works to match
the species' search procedure-sensory apparatus and the resource density
(both space and time) provided by the environment (niche). Grazing animals
that are virtually surrounded by their food can accommodate nearly any
strategy. Bacteria that swim in a medium relatively rich in nutrients employ
a tumbling mechanism not unlike the random walk until they sense the chemical
gradient leading to their food (Koshland, 1980). Sheep flocks tend to graze
in "straight" lines until something (like a sheepdog) redirects their movements.
Grazers do not rely on cue signals to find their food. Animals that hunt
(forage) for a living rely on very different mechanisms.
Foraging is a complex behavior in most species. However the search phase
can be characterized in two sub-phases depending on the animal's possession
of knowledge or cues that would lead it to the resource. In the absence
of knowledge or cue an animal tends to "wander" along novel paths as it
attempts to perform a sensory sweep of the area. This can be observed directly
by watching scout ants walking over a sidewalk or driveway. The ant does
not wander aimlessly as in a random walk, rather it takes a weaving pattern,
generally leading in some given direction. The path is novel (stochastic)
yet follows some general pattern. Or observe a hound dog trying to pick
up the scent of a prey. It will tend to weave back and forth over the area
with a minimum of crossing back over its own path.
Once an animal has either acquired knowledge (e.g. honeybee workers
observe the dance of a worker that has found a source of nectar) or senses
the nearness of the resource (e.g., odor of food), the search becomes more
directed, bringing the animal sufficiently close to the resource that the
final stages involve direct sensory contact. Even with cues and/or knowledge
there is still some uncertainty associated with the actual location of
a resource due to imprecision in sensory measurements in navigation and
communication (as in the case of the bees). Animals can be seen to continue
a weaving motion, though with much lower amplitude, as they "home in" on
the resource. This motion may be needed to obtain triangulation or gradient
information necessary for computing the location of the object. But the
existence of the motion as a gradation of that used in stochastic search
is an important clue to understanding the underlying motor program generating
the search patterns.
Another problem with using programmatic motor control models was interfacing
the program with the neural decision process. As our robot acquired the
sensory detection of a cue, how should it switch from non deterministic
search mode to directed mode? The neural network itself provided information
for following gradients, but it wasn't clear how to best integrate the
output of the neural net with the motor control program.
As we had done in looking for solutions to the learning mechanism by
turning to biological models, we started looking for neurological mechanisms
that might help solve this problem. A class of neural circuits, called
central pattern generators (CPGs) are oscillators that are responsible
for a number of rhythmic motor functions in a wide variety of animals.
We were inspired by the way a relatively simple network of neurons could
produce oscillatory outputs of the (intuitively) right form. The fact that
these circuits are known to be modulated by other neural inputs, hence
allowing the shaping of output waveform, was another attractive feature.
We decided to investigate artificial CPG circuits constructed from adaptrode
neurons. It turned out to be a useful approach.
2. CENTRAL PATTERN GENERATORS
Motion in animals involve opponent processes such as opposing muscleclature
or muscle-hydrostatic pressure opponents as in arachnid limb movement or
cardiac pumping. Such opponent forces are mediated by neural control signals
that convey the appropriate phase and amplitude information such that the
direction and force of the motion achieve the desired result without overstraining
the motor elements themselves. These motions often involve some degree
of tension between the motors but are always coordinated so as to protect
the "machinery" from undue stress. Many movements in animal behavior repertoires
as well as internal functions such as cardiac pumping and digestive tract
movement are oscillatory in nature.
The undulatory swimming motion of a leech, the escape swimming of the
mollusk Tritonia diomedea, and the heartbeat of the lobster are
known to be controlled by a type of neural circuit called a central pattern
generator or CPG. In fact, such circuits are known to control a wide variety
of rhythmic motions such as breathing, walking and swimming (Kleinfeld
& Sompolinsky, 1989). A number of heterogeneous architectures have
been described which have similar kinds of behavior. In swimming control
circuits where opponent muscle groups are alternately stimulated, the architectures
involve two output groups, each of which generates bursts of pulses in
alternating fashion (Selverston & Mazzoni, 1989). Thus the opposing
muscle groups alternately contract and relax in approximately 180 degree
phase shift from each other.
The general architecture of these circuits involves multiple groups
or clusters of neurons, some of which excite other groups or clusters and
some of which inhibit. Feedback inhibition, in which one group excites
another and the later inhibits the former, is often seen. This pattern
is seen when one or both clusters involve cells with tonic activity (self-excitation).
Figure 1 shows a two cell CPG of this form. Far more complicated circuits
with many connections are seen in many animals.
An important feature of these circuits is the temporal behavior of connections,
or synaptic dynamics. Slow and fast synapse responses are known. Thus,
as in Figure 1, the excitatory synapse from A to B may operate quickly
to cause B to fire, whereas the feedback inhibition on A may be slow, thus
allowing B to continue being excited (and hence firing) for some time before
shutting A down. The output from such a circuit appears as a half-wave.
Another extremely important aspect of many CPGs is the way in which
external inputs to the circuit can cause the output form to be modulated
in frequency or amplitude. Some circuits are turned on and off by outside
inputs (Roberts, 1989). This feature gives these circuits considerable
flexibility in responding to outside influences. Animals can change their
rhythm (or gait), and speed with the same basic circuit. Such features
are attractive when designing a control system.
Fig. 1. A simple CPG circuit showing feedback inhibition. Cell A
may be a tonically firing (self-excitatory) one which excites (flat terminal)
cell B to fire. The latter then inhibits (circular terminal) cell A. Output
from the circuit (pointed arrow) shows an episodic burst pattern.
3. AN ADAPTRODE-BASED CPG
Oscillatory motion would be a desirable approach to
causing a fixed-sensor robot such as MAVRIC to perform a sensory "sweep"
of its environment as it moves in a generally forward direction. This could
be achieved by modulating the amplitudes of two output lines, one controlling
the right stepper motor and the other controlling the left motor in an
alternating fashion. The net effect from the two motors would simultaneously
determine the degree of turning and the forward speed of the vehicle.
Getting (1989) describes the simulation of a small CPG network based
on the escape swimming control circuit of the mollusk Tritonia diomedea.
The methods employed, namely continuous systems modeling, were appropriate
for the purpose of explicating some aspects of the mechanisms of a biological
network. These were quite different from our own purposes - to emulate
behavior with a discrete system model. Nevertheless, certain features of
these models and those of adaptrode-based neurons that we had been using
to mediate non-adaptive dynamical functions in MAVRIC's brain, suggested
the possibility of obtaining behavioral results similar to those inferred
from continuous models. We were led to the notion of building a CPG circuit
for MAVRIC's motor control using adaptrode-based neurons because of the
similarities between the time-course behavior of some synapses (i.e., fast
and slow connections) in biological models and that of non-associative
adaptrodes having different valued control parameters (Mobus, 1994a). Adaptrodes
with slowly or rapidly rising or falling response curves can easily be
constructed. Additionally, as noted by Getting (1989), some types of interneurons
participating in CPG circuits demonstrate an habituating-like behavior
in that they cease firing after some period of continued stimulation even
though the stimulus remains. This is modeled in adaptrode-based neurons
by using a special adaptrode, the input to which is actually the output
of the neuron, to up-modulate the threshold of that neuron so that, after
a burst of activity, the neuron's firing rate diminishes and eventually
ceases. Using the basic description of the simulated tadpole swimming circuit
(Roberts, 1989) we constructed a simple four-cell model CPG (Figure 2).
Fig. 2. The Central Pattern Generator (CPG) motor control circuit
is comprised of four "core" neurons that generate a quasi-chaotic sinusoidal
signal (left motor minus right motor). Additional neurons provide signal
distribution for externally applied control signals that are used to modulate
and shape the output signals. Pointed arrows indicate signals from/to the
external environment. Flat terminals indicate excitatory connections while
circular terminals indicate inhibitory ones.
The circuit in Figure 2 reflects the bilateral symmetry of the vehicle
and the opponent process used to steer. It is composed of two "tonic" cells,
which spontaneously fire at a relatively constant low rate, and two "feedback"
cells. In operation the right tonic cell is given a boost in firing rate
by an input from the "left bias" neuron (to get things started). In turn,
the right tonic cell stimulates the right feedback cell which undergoes
a transient increase in it's output . This cell provides inhibitory feedback
to the tonic cell, thus dampening its output, but it also provides excitatory
input to the left tonic cell, increasing its output. Subsequently, the
left tonic cell excites the left feedback cell which inhibits the left
tonic cell and excites the right tonic cell, thus starting the cycle over
again. Once set in motion, the contralateral excitation, unilateral inhibition
cycle will continue indefinitely.
Output from the tonic cells is used to excite the right and left motor
cells (these generate the stepper motor pulses) on their respective sides.
When the right tonic cell is active (and the left tonic cell is relatively
quiet by comparison) the right motor cell generates many more pulses per
unit time and thus causes the right stepper motor to rotate faster. This
produces a net left turning motion in the robot. Similarly, when the left
tonic cell is more active, a right turning motion is made. The net effect
of this alternating activity between right and left motor cells is to cause
the robot to follow a weaving course, first turning leftward, then rightward,
as it moves in a generally forward direction.
Excitatory synapses are implemented using relatively fast-response,
slow decay non-associative adaptrodes, while inhibitory synapses are implemented
with slow-response, fast decay non-associative adaptrodes. In addition,
the threshold of the feedback cells are modulated, slowly, by an adaptrode
which gets its input from the output of those cells. The modulation pushes
the threshold of the cell lower than its resting level, thus making it
easier for the net input to the cell to cause the cell to fire. Since adaptrodes
continue to provide a decaying response after the input signal ceases,
the effect is to cause the feedback cells to act as bursters. That is,
they fire a volley of pulses which continues for a time after the offset
of the excitatory input from the corresponding tonic cell. The length of
time that this burst continues then determines the time that the corresponding
tonic cell will be inhibited and the contralateral tonic cell will be excited.
Clearly, the slowness of the inhibitory synapse from the feedback cell
to the tonic cell must match the duration of the burst.
Additional inputs to the circuit (shown in Figure 2 as pointed arrows)
provide a convenient method of modulating the oscillator in a manner reminiscent
of the biological CPGs described above. These inputs are distributed by
a layer of neurons which provide a means of overriding, through inhibition
(not shown), the effect of the primary signal. The input labeled "Speed"
does not directly effect the oscillator circuit. It merely increases the
firing rate of the two motor output cells as shown. Other inputs change
the form of the output as depicted in the histogram in Figure 3, below.
The "Straight" signal can be seen to provide inhibitory input to two cells
that supply the inhibitory feedback signal to the tonic cells. This action
induces the two tonic cells to become more synchronous leading to a nearly
zero net output and the robot moves in a somewhat straight line. When the
"Straight" signal is removed or just reduced, the signal returns to its
oscillatory behavior with a peak amplitude proportional to the level of
the "Straight" signal.
Similarly, the "Go Right" or "Go Left" signals dampen their respective
tonic cells which in turn dampens the contralateral excitation feedback.
This allows the undamped tonic cell to provide a more or less steady output
to its respective motor cell and results in the robot making a smooth and
sustained turn. A weaker signal at one of these inputs results in a weaker
turning with a small oscillatory wave superimposed.
The "Left Bias" and "Right Bias" signals perform a slightly different
function when used. If the "Right Bias" cell is active (it is actually
a tonic cell also), the net output signal to the motor cells is essentially
the same in form but tends to favor the left motor output. The robot will
move in the same weaving pattern but has a tendency to move in a diverging
spiral. The capacity to make turns or go straight is not seriously hampered.
The modulatory inputs to the CPG circuit come from the cue recognition
and direction determining networks of MAVRIC's brain (Mobus & Fisher,
1994).
As originally conceived, this circuit was expected to produce a sinusoidal
net output (between left and right motor cells). Indeed, the weaving pattern
that the robot followed when run with the new neural oscillator controller
was roughly sinusoidal. However we noted certain peculiarities in the pattern.
Far from producing a fixed sinusoid, the weave pattern of the robot showed
erratic deviations from a sine curve. Sometimes it would make short right
or left turns followed by long, sweeping turns. Other times it would make
short turns one way and long turns the other causing it to veer dramatically
from a given course. Our first conjecture on this behavior was that this
was simply the result of slippage of the wheels, a notorious problem in
conventional robotics. However, after looking at the pulse counts being
sent to the motors it was clear that the erraticness was due to the output
of the oscillator. Subsequently we collected and analyzed the raw count
data from the oscillator and were surprised to discover that it generated
a more complicated signal. As seen in the histogram in Figure 3, the oscillation
appears to have a complex waveform.
Fig. 3. A histogram of the CPG net (of right and left motor signal)
output shows that the direction of the robot varies in a roughly sinusoidal
pattern with non-deterministic characteristics of amplitude and wave length.
The amplitude of each bar represents the degree of turning that the robot
does per second. Negative values represent turns to the right, positive
values indicate turns to the left.
Following a suggestion by Bruce West, we decided to look at this data from
the perspective of chaos theory. Specifically we undertook to reconstruct
the attractor dynamics from a large sample of the time series as shown
in the histogram. Figure 4 shows a plot of a two dimensional attractor
based on plotting a reconstructed phase space from two points in the time
series separated by a specified number of time points, in this case 2.
As can be seen in the plot, some kind of non-periodic attractor is associated
with the oscillator process. To claim that the system is chaotic, in the
sense of deterministic chaos, would be premature from this one form of
evidence. However, several observations might be made. First, it is clear
that the system is bounded in phase space-hence some kind of attractor
is operative. Second, it appears to be the case that the system does not
repeat itself; that it is not merely a quasi-random number generator. A
final piece of evidence comes from mapping the course taken by the robot
(unstimulated by external cues) over a number of runs. This plot (Figure
5), showing divergence as the search time increases, strongly suggests
that the path taken is sensitive to initial conditions. We have not pursued
a more rigorous analysis of this process with respect to its apparent chaotic
nature for the simple reason that it isn't terribly important to our main
line of work to show that the oscillator is chaotic in some strict sense.
We were seeking a means of getting a robot to scan its environment while
it moved, and to that end the oscillator provided a good solution. That
it should introduce novelty into the search path was, from our perspective,
a bonus. However, the form that this novelty takes, that is in not being
a random walk, may have important consequences for the notion of foraging
search.
4. RESULTS OBTAINED WITH MAVRIC
Our original objective had been to get MAVRIC to weave as it moved forward
so that it would obtain a sensory sweep of its environment. We also required
that the motor program allow for easy modulation so that the robot could
shift to directed movement gracefully. To those ends the CPG oscillator
network produced excellent results. In numerous test runs from a fixed
starting point, (home), MAVRIC was able to sense and thus learn a sufficient
number of cue-mission events that were placed in its workspace. Subsequent
runs showed that MAVRIC was able to locate resources from cue information
alone as described in (Mobus & Fisher, 1993; 1994). We had achieved
the desired level of autonomy in that MAVRIC no longer needed to be "trained"
on specific contingencies in order to succeed in finding its mission resources.
Fig. 4. Plot of a reconstructed, two dimensional phase space showing
the attractor dynamics of the CPG oscillator. This plots the behavior in
a space defined by two points taken from the time series data as shown
in Figure 3. The two points are separated by two time ticks (TAU = 2).
This plot was generated from 20,000 data points.
That the oscillator produced a novel search path each time the robot was
started from its home did not seem to have any negative consequences[5] for
the success of the robot, as had been the case with the random walk novelty
scheme. It did not immediately dawn on us that these paths might have a
real beneficial impact on the capabilities of the robot.
When thinking about the nature of real world environments we realized
that many mission resources occur in stochastic episodes. Certainly the
food resources of foraging animals appear and disappear in different locations.
There may be a certain amount of regularity in these dynamics, for example,
fruit may ripen in a given season or prey animals may always show up at
a watering hole. Thus the nature of resource dynamics may, itself, be chaotic,
and the timing and quality of the resource are subject to variations that
make precise prediction impossible. Because the occurrence and location
of resources might be indeterminate, it would be counterproductive for
an agent to follow the same path on each iteration of searching. Rather,
following a novel path will further ensure that the agent searches a larger
volume of the space. Novel search paths would also provide an answer to
the problem of occlusion of a resource behind fixed obstacles.
Fig. 5. Examples of typical search paths traversed by MAVRIC show
the drunken weave pattern resulting from the OSC output as shown in Figure
2. Also note the apparent bounded nature of the search envelope and the
fact that each run takes a novel course.
At the same time, search paths would seem to best be constrained in terms
of a general direction. Random walk search, while producing completely
novel paths on each iteration, appears to cover too much of the volume.
If, as we have speculated, resource dynamics in natural environments follow
a chaotic pattern (at least in the general case), then it makes sense that
one should constrain the search volume to correspond with the spatial extent
of the sought phenomena. A chaotic search pattern would thus provide constrained
novelty to match that found in the sought resource.
To do a preliminary test of this notion, we projected part of the trajectory
of a Lorenz attractor phase space map onto the plane. A series of points
along the trajectory were chosen randomly to represent locations and times
for the occurrence of mission-critical events. Within the sensory envelope
of the event a second event, called a cue, was defined which would reliably
associate with the mission event[6].
The result was that resources (and cues to their presence) appeared stochastically
and episodically as the trajectory evolved (Figure 6). MAVRIC was set on
a series of iterative searches over the course of this evolution. The figure
shows a single mission event and its relationship to the search envelope
resulting from the chaotic weave pattern of MAVRIC's motion. Two paths,
as shown, lead to finding the resource. The path labeled A shows the result
of a coincidental discovery of the resource prior to learning the association
between the resource and the cue event (a light source). MAVRIC's course
is unaffected by the cue gradient, but as the robot encounters the resource
gradient it transitions from the chaotic weave pattern to a gradient following
pattern. This transition is effected via the modulation of the CPG oscillator
by the directional input, "Straight", as shown in Figure 2. This signal
is supplied by a gradient detection network that increases the "Straight"
signal in proportion to the difference between two successive samples of
the stimulus[7].
Fig. 6. Experimental layout for MAVRIC to search for a chaotic resource.
A common chaotic attractor phase space map has been projected onto the
plane of the lab floor. This trajectory is used to place and time the occurrence
of resource (mission-critical) events and their associated cue events.
Several representative MAVRIC search tracks are shown.
The path labeled B shows a path after learning the association in which
MAVRIC follows the cue gradient, by the same mechanism as it follows a
resource stimulus gradient, until it encounters the resource gradient,
afterwhich it follows the latter to the source. In this model, the learned
cue need not be contiguous with the resource object. The extent of the
cue field should be greater than that of the resource so that it has the
effect of enlarging the latter. Sensing the cue at a greater distance from
the resource than it could sense the resource itself, the robot transitions
from chaotic wandering and entering the cue field improves its chances
of encountering the resource field.
Other paths depicted in the figure would lead to failure. Under additional
assumptions regarding the average resource density, several mission (or
resource) events might be present in the map of the trajectory at any given
time. These would be spaced so that there would be no overlap in the resource
gradient fields as detected by MAVRIC.
Thus MAVRIC was set to searching for its mission resources in what we
believe represented a natural-like environment. As it had when the resource
was set in the same location each time, MAVRIC was able to find its objective
in a completely autonomous way, first finding the resource coincidentally
and then learning how to improve its results by following the cue gradients
that it encountered.
Because of its flexibility and ease of interfacing directional and speed
controls with the other parts of MAVRIC's brain, the CPG oscillator proved
to be an excellent, if not convenient, approach to solving the motor program
problem for this class of robot. The injection of novelty in the absence
of directed search has added additional, unanticipated, functionality to
the process and, at least from the perspective of our initial experiences,
has improved the actual performance of that process.
5. SOME IMPLICATIONS AND OTHER APPLICATIONS
Searching a large, dynamic space for a consumable resource, within time
constraints, presents some interesting challenges for robotics and autonomous
agents in general. The nature of the dynamism of natural environments should
in general be considered at best stochastic and possibly chaotic. This
translates into some degree of uncertainty as to the location of the resource
each time a search is conducted. In the case of chaotic dynamics, traditional
statistical-based prediction is probably inadequate.
Our experience with the chaotic oscillator motor drive network or artificial
CPG has led us to consider the role of novelty in generating search paths
in the absence of cue-directed searches. Novelty ensures that the search
path selected on iterative searches will not lead the agent to the same
places each time. In this regard it may be analogous to the use of simulated
annealing (SA) to prevent gradient-following algorithms from getting stuck
in local minima. Clearly, once a resource has been found at a given location,
and consumed, the agent need not return to that location in a subsequent
search. The degree of novelty (perhaps analogous to the temperature in
SA) is important in that too much results in something like a random walk
in which old territory may be explored repeatedly. Not enough results in
a relatively narrow envelope of exploration relative to the total space.
That a simple circuit such as the CPG can generate novelty of the "right"
form in motor control suggests other possibilities for the role of such
circuits in exploration in general. The existence of oscillations in the
cortex of vertebrate brains has been established. Is it possible that some
of these are sufficiently chaotic to cause novel firing patterns of other
neural circuits? Could such circuits give rise to creativity in human mental
processing?
One of the cornerstones of human level search in a high dimensional
"thought" space is the role of creativity in finding novel paths to a solution.
Do we not often "forage" (or hunt) for a solution to a problem? These are,
of course, highly speculative musings and clearly far beyond the realm
of investigating intelligent behavior in snail level artificial agents.
However, such speculations may prove to be useful guides in exploring brain
architectures for more complex robots in the future.
In the meantime, we have begun to think of the method of foraging, in
the sense of coupling a chaotic path generator with a learning mechanism,
as a solution to search problems in discrete domains such as computer memories.
Exhaustive search of a discrete space, as represented, say, by a graph,
is not overly burdensome, computationally, in that it is known to scale
linearly with the number of nodes and edges. However, the problem of searching
a large space for dynamically changing resources (graph labels) can be
problematic if the rate of change in the labeling or topology of the graph
is greater than the time needed to conduct the search. If one views the
combined memory space of all computers tied into the Internet, for example,
as a vast, dynamic search space, then it is possible that the method of
foraging search may be brought to bear. We are thus beginning to look at
the extrapolation of the methods that have proven useful in our physical
robotics application to the cyberspace environment afforded in the Internet.
Here a "knowbot" agent, or software robot, (Etzioni & Weld, 1994) might
forage through the network, looking for resources, such as documents containing
specific key phrases. It could use the names of computers and files (or
gopher menu components) as potential cues, learning cue words that increase
the likelihood of finding the sought documents over repeated searches.
It will be interesting to explore the notion of a chaotic oscillator in
such an architecture.
6. ACKNOWLEDGMENTS
We would like to thank Manny Aparicio for his contributions to this work
in discussions and collateral work with Mobus on foraging search in animals.
We would also like to thank Bruce West for his suggestions and guidence
in reconstructing the dynamic behavior of the oscillator.
REFERENCES
Alkon, D. L. (1987). Memory Traces in the Brain. New York, NY: Cambridge
University Press.
Braitenberg, V. (1984). Vehicles: Experiments in Synthetic
Psychology. Cambridge, MA: The MIT Press.
Brooks, R. A. (1991). Intelligence without reasoning. Technical
Report A.I. Memo No. 1293. Cambridge, MA: MIT Artificial Intelligence
Laboratory.
Durbin, R., Miall, C. & Mitchison, G. (Eds.), (1989). The
Computing Neuron. New York, NY: Addison-Wesley Publishing Co.
Etzioni, O. & Weld, D. (1994). A softbot-based interface to
the internet. Communications of the ACM, 37,7, 72-76.
Getting, P. A. (1989). Reconstruction of small neural networks.
In C. Koch & I. Segev (Eds.). Methods in Neuronal Modeling: From
Synapses to Networks. (pp. 171-194). Cambridge, MA: The MIT Press.
Kleinfeld, D. & Sompolinsky, H. (1989). Associative network
models for central pattern generators. In C. Koch & I. Segev (Eds.).
Methods in Neuronal Modeling: From Synapses to Networks. (pp. 195-246).
Cambridge, MA: The MIT Press.
Koshland, D. E. (1980) Bacterial Chemotaxis as a Model Behavioral
System. New York, NY: Raven Press.
Mackintosh, N. J. (1983). Conditioning and Associative Learning.
London, England: Oxford University Press.
Mobus, G. E. (1994a). A Multi-Time Scale Learning Mechanism
for Neuromimic Processing. Ph.D. Thesis, Unpublished. Denton, TX: University
of North Texas.
Mobus, G. E. (1994b). Toward a theory of learning and representing
causal inferences in neural networks. In D. S. Levine & M. Aparicio
(Eds.), Neural Networks for Knowledge Representation and Inference,
(pp. 339-374). Hillsdale, NJ: Lawrence Erlbaum Associates.
Mobus, G. E. & Fisher, P. S. (1991a). Conditioned response
training of robots using adaptrode-based neural networks I: Continuous
adaptive learning. In G. Mesnard & R. Swiniarsk (Eds.) Proceedings
of the International AMSE Conference on Neural Networks. (pp. 171-182).
San Diego, CA: Association for the Advancement of Modeling and Simulation
Techniques in Enterprises.
Mobus, G. E. & Fisher, P. S. (1991b). Conditioned response
training of robots using adaptrode-based neural networks II: Simulation
results. In G. Mesnard & R. Swiniarsk (Eds.) Proceedings of the
International AMSE Conference on Neural Networks. (pp. 183-194). San
Diego, CA: Association for the Advancement of Modeling and Simulation Techniques
in Enterprises.
Mobus, G. E. & Fisher, P. S. (1993). A mobile autonomous robot
for research in intelligent control. Technical Report CRPDC-93-12.
Denton, TX: Center for Research in Parallel and Distributed Computing,
University of North Texas.
Mobus, G. E. & Fisher, P. S. (1994). MAVRIC's Brain. In F.
D. Anger, R. V. Rodriguez, & M. Ali, (Eds.), Proceedings of the
Seventh International Conference, Industrial and Engineering Applications
of Artificial Intelligence and Expert Systems (pp. 315-322). Yverdon,
Switzerland: Gordon and Breach Science Publishers.
Roberts, A. (1989). A mechanism for switching in the nervous system:
turning on swimming in a frog tadpole. In R. Durbin and C. Miall and G.
Mitchison (Eds.), The Computing Neuron. (pp. 229-243). New York,
NY: Addison-Wesley Publishing Co.
Selverston, A. & Mazzoni, P. (1989). Flexibility of computational
units in invertebrate cpgs. In R. Durbin and C. Miall and G. Mitchison
(Eds.), The Computing Neuron. (pp. 205-228). New York, NY: Addison-Wesley
Publishing Co.
Footnotes:
1 This follows from the "Darwinian" argument that were
they not successful, they would not survive. [back to
text]
2 Here we are using the word in the sense of organized
uncertainty, as in deterministic chaos. The world is organized within broadly
determined "laws" but, owing to nonlinear interactions and lack of precision
in determining initial conditions, becomes less predictable as the time
horizon lengthens. [back to text]
3 In our system, a cue is a sensory gradient that
extends over a much wider area than the sensory gradient of the resource
object, for example, a visual cue (the sight of a landmark) may be seen
from a greater distance than the scent of food.[back to
text]
4 The temporal ordering observed is generally
strict and requires that the primary signal arrives some time prior to
the arrival of the secondary signal. The time interval required depends
on the time-domain level being gated. In the case of gating the transfer
of the short-term memory trace into the intermediate-term trace this may
only require a few time steps. The ordering, however, ensures that a proper
causal correlation is encoded. [back to text]
5 Ironically, we spent a fair amount of time trying
to smooth out the erraticness of the oscillations without increasing the
complexity of the circuit - to no avail. Eventually, after seeing that
the novel paths did not reduce the robot's ability to find resources, we
decided to let well enough alone. [back to text]
6 Typical mission events would be the sounding
of a specific tone the volume of which was varied to represent the gradient
field. A cue event was the placement of a light near the speaker. The light
would be dimmed so that MAVRIC had to be within a certain distance before
it could detect the light. [back to text]
7 This strategy emulates the tumbling search mechanism
in swimming bacteria reported by Koshland (1980). [back
to text]
Back to George
Mobus' Home page.
|