Presented at The Third International Conference on Computing Anticipatory Systems, Liege, Belgium, August, 1999. This paper was selected as "Best of the Symposium" for Anticipatory Control and Robotics. It will appear in the proceedings of that conference and in the Journal for Computing Anticipatory Systems, Daniel Dubois, Ed. [Note: This version has been slightly modified from the original with better graphics and some needed edits.]
Foraging Search: Prototypical Intelligence
George E. Mobus
[Formerly]Dept. of Computer Science, Western Washington University
We think because we eat. Or as Descartes might have said, on a little more reflection, “I need to eat, therefore I think.”
Animals that forage for a living repeatedly face the problem of searching for a sparsely distributed resource in a vast space. Furthermore, the resource may occur sporadically and episodically under conditions of true uncertainty (non-stationary, complex and non-linear dynamics). I assert that this problem is the canonical problem solved by intelligence. It's solution is the basis for the evolution of more advanced intelligence in which the space of search includes that of concepts (objects and relations) encoded in cortical structures. In humans the conscious experience of searching through concept space we call thinking.
The foraging search model is based upon a higher-order autopoeitic system (the forager) employing anticipatory processing to enhance its success at finding food while avoiding becoming food or having accidents in a hostile world. Aforager is an anticipatory system as defined by Rosen. I present a semi-formal description of the general foraging search problem and an approach to its solution. The latter is a brain-like structure employing dynamically adaptive neurons. A physical robot, MAVRIC, embodies some principles of foraging. It learns cues that lead to improvements in finding targets in a dynamic and non-stationary environment. This capability is based on a unique learning mechanism that encodes causal relations in the neural-like processing element.
An argument is advanced that searching for resources in the physical world, as per the foraging model, is a prototype for generalized search for conceptual resources as when we think. A problem represents a conceptual disturbance in a homeostatic sense. The finding of a solution restores the homeostatic balance. The establishment of links between conceptual cues and solutions (resources) and the later use of those cues to think through to solutions of quasi-isomorphic problems is, essentially, foraging for ideas. It is a quite natural extension of the fundamental foraging model.
The central thesis of this paper is that what we call intelligence in animals and man has its origins in the solution to finding and consuming food, and staying alive. The capacity to think, that is to reason with concepts, derives from the architecture of an adaptive control system for finding food and avoiding predators and accidents. In this view, the core mechanism, grounded in the more primitive centers of the brain, forages for pattern resources in the cortical regions. In humans the conscious experience of this we call thinking.
The brains of early invertebrate animals are clearly organized for the function of conducting foraging search while staying out of trouble. The brains of later evolved animals, capable of surviving in larger, more complex environments, add cortical layers for encoding and manipulating patterns. But they remain grounded in the ancient architecture which can be put to a new purpose — finding patterns in the cortex. We think because we need to eat. And we eat only if we find food. Finding a sparsely distributed resource (while avoiding threats) in a vast, nonlinear dynamic and non stationary stochastic environment is the first problem that a living system must solve. This problem reaches back to the very origins of animal (motile, phagocytic) life. And it is the foundation for more advanced forms of solution finding/problem solving.
Thus the hypothesis — the control mechanism of foraging search is the basis for, and mechanism of intelligence.
The problem of finding adequate food sources, including water, is one of the most fundamental aspects of animal life. This problem is faced to one degree or another by every kind of animal, from the simplest single-celled protoctista to humans. It is compounded in difficulty by the need to avoid becoming food and for avoiding accidents in an often hostile world. While some animals have adapted to consuming algae or grasses and can simply graze, for most other animals the problem is one of finding food that is often sparsely distributed in a territory that is relatively vast compared to the scale of the animal's body. Further complications come from the dynamics of occurrence of food sources (not to mention the threats to the animal's existence). Most often these can be characterized as sporadic (recurring but not necessarily periodic), episodic (lasting for variable amounts of time) and erratic (variance in amplitude within an episode). In general the statistical properties of these occurrences are non-homogeneous, non-stationary.
A general procedure for finding food under the above conditions is observable throughout the animal kingdom. While the details of behavioral correlates may vary greatly among the diversity of animal types, there are broad features to foraging behavior, which are conserved. Foraging search is marked by two basic kinds of behavior, one of which will tend to dominate gracefully depending on the animal's state of perception (Mobus, 1999). The first appears as a kind of stochastic search of the space. It is the mode used when the animal has no environmental cue signaling which direction to take. The second is a much more directed, purposeful, movement toward a resource goal. It follows from an animal finding and recognizing cues that signify the nearness and direction of the goal. Such cues may be innately encoded, such as the odor of a prey left on the trail, or learned. In higher vertebrates, the latter seems to predominate and allows an animal to exploit a wider range of cue signals, many of which are contingent in a specific environment.
Whether innate or learned, the encounter with cue stimuli helps direct the animal's search efforts such that the search process itself becomes more economical from an energetic perspective. When an animal can take preemptive action in securing food or avoiding being food, it may substantially increase its survivability if not reduce its energetic costs. It is altogether reasonable to assume that there is a selective advantage to such preemptive action. Since the capacity to so act depends on the capacity to anticipate the future, and since anticipatory behavior is a hallmark of intelligence, it seems completely reasonable to propose that the origin and evolution of intelligence derives directly from the fundamental need to find food efficiently.
In this paper I will present a model of foraging built, as it were, from the ground up. The model starts with the establishment of semantics in autopoiesis. I then add an anticipatory capability with causally-related gating of adaptive response and show a general framework for how this leads to improved energetics. Anticipation, based on causal relation encoding, is the key to learning, specifically learning cues. I will then argue that foraging search is the quintessence and core of intelligent behavior. Nervous systems are designed to be foraging controllers. I describe the brain, with its memory and learning systems, as a higher-order autopoeitic system in which anticipatory processing improves the acquisition performance of the unity. This allows the exploitation of ever more complex environments. I will describe a robotic forager that learns how to improve its acquisition performance in spite of being embedded in a non-stationary world. I will then describe how foraging is a prototypical framework for cognition — that thinking is, in fact, the foraging for concept resources in mental space.
The opening salvo of this work, however, is to assert that foraging search is the most general search procedure. When one adds constraints and qualifications to the specific search problem it is possible to reduce foraging search to a more specific form such as decision tree search. For an extreme example, given an environment that has no dynamism, probabilistic stochastic (stationary and closed boundaries) — that is, a purely syntactic world — foraging after learning the cues (see below) becomes deductive logic!
Anticipation is not mere prediction. The concept of an anticipatory system (I will use the term agent) is credited to Robert Rosen (Rosen, 1985) and more recently developed mathematically by Dubois (1998). In this concept, an agent possesses a model of the world (or a sufficient subset of the world) which it can operate in fast forward to obtain a prediction of the future state of affairs based on the current state of affairs. This prediction then motivates some action selection mechanism such that the current behavior of the agent changes from what it would have been had the agent no access to such a prediction. In other words, the present has some dependence on the future! Lest this seem a logical contradiction — the prediction must have been wrong — we must look more closely at the nature of this prediction and the underlying model that gave rise to it. Among other factors, we must note that the model is not, itself a fixed thing, possessed by the agent from the very start. Rather, the model is adaptive and is always the product of adaptation. Furthermore there is no necessary contradiction in the notion of causality as might be inferred from a superficial examination of the above statement.
Prediction, as it is used above, is a subtle concept in itself. The agent is embedded in a stochastic world. There are, of course, statistical methods for obtaining predictions of future states but these methods depend on properties such as the Markov property, and stationarity. The real world, as argued in more detail below, is decidedly not stationary. Consequently reliance on statistical modeling for prediction is not a good idea. What is a good idea is a model based on causal relations. While even such relations may be stochastic, hence the model must be adaptive, causality is the one property on which we can rely. As will be developed below, living agents have the capacity to learn causal relations which give them the capacity to predict what might happen if they take no action. Such predicted future states of affairs have deep semantic value to the agent and thus motivate the agent to take actions which will change the future. If an agent can predict its immanent demise in the claws of some preditor, it makes sense to run. Thus, in the notion of anticipation we have true predictions that do not necessarily come true!
How does a living agent do this trick? An understanding of what it means to be alive and reactive to the world is a necessary first step in developing the notion of anticipation and the route to intelligence.
The study of intelligence is the study of life. Therefore the beginning of understanding of intelligence is the understanding of life itself. Autopoiesis has provided an illuminating framework for this effort.
Maturana and Varela (1980) describe autopoiesis as the network of processes surrounding a core of homeostatic mechanisms. These processes construct the components, which make up the machinery of other processes in the network. An autopoietic system is self-referential in that it continually manufactures itself. And the self that it manufactures consists of all of those subsystems it needs to protect its basic homeostatic core. It is useful to ask what are the energetics of an autopoietic system? What is the cost of maintaining stability in a fluctuating world? And how might that cost be minimized?
Consider a homeostatic loop subjected to an environmental deformation. It takes time for the system to respond and restore inner balance. The lag is due to internal inertia in redirecting energy flows to do the work necessary to respond (see below). In the time taken to do so, we assume that the system incurs a cost in energy and material to restore the system to balance, a cost of deformation,CD. The longer the time lag involved, the greater the cost extracted for restoration. CD = f(D(t), t); where D(t) is some measure of deformation and t is time.
FIGURE 1. A homeostatic system within an autopoietic framework. A feedback loop from the measurement of a critical factor, e.g., pH, which is an input to other metabolic processes in the autopoietic system, activates a response mechanism within the organism. This mechanism, working through an environmental interface, e.g., ion channels or motor responses, acts to counter the environmental influence that generated the change in the critical factor from its ideal state. An autopoietic system includes maintenance mechanisms outside of the homeostasis subsystem, but directed toward keeping the response mechanism ready to act. All of these subsystems use up energy and materials from reservoirs through an internal budgeting system (not shown). Thus the autopoietic whole realizes costs associated with responding, maintenance and possible damage done to the other metabolic processes which would require additional resources to repair.
A main feature of an autopoietic system is the flexible material and energy budget used to direct the flows of resources so that homeostatic balance is always maintained. When outside influences (environmental contingencies) deform the network, it is able to shift flows as necessary to restore balance. Numerous redundant pathways and reaction mechanisms for this purpose characterize the system. It is the case that much of the work of maintaining an autopoietic network is directed at maintenance of this extra machinery. Thus there is a cost associated with maintaining the capacity to restore balance quickly. Let us call this cost CM. This is just the autopoietic work done, which diminishes internal stores of energy and material. The cost is a function of the likelihood or expected (see below) deformation and time; CM = g(E[D](t), t).
When a deformation occurs, there is also a cost associated with rerouting energy/material resources, or a cost of responding, CR. It is some function of the compliment of expectation of deformation and time; CR = h(1 - E[D](t), t). This assumption is based on the notion that a system, which expects to be deformed, will be prepared to respond. Thus there is a tradeoff between CM and CR.
We assume that, from the process of evolution, a specific class of autopoietic systems is organized such that only those subsystems, which are most often subject to deformation and/or for which the cost CD is particularly high, are supplied with adequate energy and material to respond instantly. Other subsystems, where the likelihood of deformation is lower, may be kept at a minimal level of responding in order to minimize the cost of maintenance associated with that subsystem. These must be strengthened in response to actual deformations. Since the overall budget is flexible, it is possible for the unity to shift the flows of energy and material on an as-needed basis to those subsystems that experience some, presumably short-term deformation. This strategy reduces overall energy/material costs while maintaining adequate responsiveness to environmental contingencies. A unity is considered fit when the sum of the time integrals of costs, ∫CD + ∫CM + ∫CR is minimized over the life of the individual and given the actual deformation demands realized over that life and over all responding subsystems.
A key concept here is the notion of likelihood of deformation for a particular subsystem. As stated above, evolution, operating on a time scale quite long compared with the lifetime of an individual, selects for maximal maintenance those subsystems, which have a high likelihood of experiencing deformation in the normal course of events in an environmental niche. Other subsystems may get by with minimal or lesser levels of reaction readiness, given that resources can be rerouted to strengthen them on an as-needed basis. But this leaves open the question of what sorts of likelihood values are experienced by a given individual in the course of its lifetime. It is altogether possible that a given individual in a particular environment encounters some kinds of deformations, which have low likelihood in evolutionary time scales, more frequently. For example, a given species of animal may have adapted to a specific temperature range, which has held stable over evolutionary time. But over the course of the life of individuals in one generation, the climate may, for a time, have a higher variance in temperature, subjecting those individuals to unexpected temperature levels. How then should the system minimize costs?
Stimulus — Response
To motivate the notion of minimizing costs in an autopoietic system we start with a model of cost accumulation in a simple stimulus-response (SR) mechanism. In this model, an environmental factor generates the onset of deformation, leading to the buildup of costs. The system responds, driving down the stimulus, but accumulating additive costs of responding as given above. Graph 1 (below) shows the accumulation of costs in this model. Note the time lag between the onset of the stimulus and the onset of the response. It is during this lag that the cost of damage or deformation accrues. Simple feedback (first-order cybernetic) systems suffer this consequence. There are several ways in which costs may be reduced. In the next section I introduce the concept of adaptive response. The main idea here is that a system can expend a little extra maintenance and response energy by keeping a more elevated readiness posture. The higher amplitude response will more quickly drive down the cost of damage (the major cost component) thus lowering overall cost. This works, however, only if the system increases the maintenance and response costs based on recent past experience. In other words, the system will keep an elevated response capability only so long as it has cause to expect additional stimulus. The system must maintain a memory of prior stimulus histories that allows it to adjust its level of response readiness. This strategy works in situations where stimuli tend to come in episodic bursts, which is the case for many natural environments.
GRAPH 1. Standard stimulus-response model showing cost accumulations. The stimulus is initiated by an external input signal but is reduced by feedback from the system response. There is an inherent time lag of three steps between the onset of a stimulus and the onset of a response due to the fact that this is a simple feedback loop. Repair cost is cost of repairing the deformation or result of homeostatic imbalance due to the stimulus. Response cost is the cost of restoring balance to the system. Maintenance cost is not shown seperately in this graph but it is the cost of maintaining the response capability. This can be seen more explicitly below. Total cost is the sum of all three costs in each time step. Note that in this simple model total cost exceeds 0.5 (relative scale).
Adaptive response involves a long-lasting change in the capacity of a system to respond to repetitive deformations (Mobus, 1994a). A simple example of this is the change in muscle tone and bulk that follows from weight training. Many tissues display such ability. Neurons, as another example, alter their responsiveness to pulses of neurotransmitter at synaptic junctions in multiple time scales (Alkon, 1987; Mobus, 1994a). Short-duration excitatory, post-synaptic potentiation (EPSP), short-term potentiation (STP) and long-term potentiation (LTP) are examples of increased responsiveness lasting over varying scales of time and due to different regimen of excitation. Autopoietic systems reroute resource flows to sustain responsiveness in areas of the network that have sustained repeated (temporal reinforcement) deformation. This is a primitive (non-associative) form of learning with the sustaining flow forming a memory trace. With a cessation or reduction in frequency of deformation activity, the system reverts, through natural decay, back toward its previous levels of responding. The re-channeling of resource flows to improve responsiveness to a specific site of deformation in the network is proportional to the frequency and amplitude of deformation (Mobus, 1994a). Hence the costs of maintaining the machinery for responding is just that needed to match the demands in the life of an individual.
GRAPH 2. Non-associative adaptrode enhanced model of cost accumulation. In this model the adaptrode records a memory trace of prior stimulus episodes. The response due to this memory trace is quicker than with the simple SR model above. The faster response drives the stimulus down sooner. In turn, the system accumulates lower cost due to deformation, but at the expense of slightly greater costs of maintenance.
The adaptrode mechanism reported in (Mobus, 1994a) can be used in a non-associative mode as a memory trace mechanism. Graph 2 shows the cost accumulations for an adaptrode enhanced stimulus-response model. Here the total cost has been lowered due to memory effects and stronger, faster response. If we assume that the cost of regaining balance, CD, is generally higher than the cost of maintenance of a capacity to respond, CR, then it is easy to see that the trade-off in costs results in an overall lower lifetime cost. The latter is paid only on an as-needed basis as the result of actual experience.
Associative Adaptation, Causal Encoding and Anticipatory Response
Adaptive response is a first step toward lowering overall cost. It reduces the accumulation of CD and CR at the expense of accumulating CM. However, if as argued above, the CD cost is substantially higher than the CM cost, then the system will still enjoy a cost advantage. Reactive response does not eliminate CD, however, since the system will respond only after some deformation has taken place. What would be ideal is to act preemptively before any deformation damage occurs. The machinery for reacting is already being maintained so no additional costs accrue. It can be argued that such action would considerably reduce the costs associated with restoring balance. If, in fact, there is a substantial difference between CD and CM, then some small additional increase in CM that ‘bought’ the ability to anticipate a deformation would be well worth it.
Graph 3. Associative adaptrode response. An associated cue signal precedes the actual (meaningful) stimulus and generates a response that nearly completely damps the experienced stimulus. The system anticipates a homeostatic deforming stimulus and preempts it.
Associatively gated adaptation provides precisely this tradeoff. A variety of environmental factors that do not deform the homeostatic core, but for which some relatively cheap receptive machinery can be maintained provide a solution. Such factors, e.g., patterns of light intensity, sounds, etc., may be causally linked with factors that do cause deformations. Where the event of change in one of these factors reliably precedes the change in a deforming factor (e.g., light precedes temperature rise) then the former can be used as a predictor of the latter (Mobus, 1994b). Associatively gated adaptation of post-synaptic efficacy has been investigated in both invertebrate and mammalian neurons (Alkon, 1991). Alkon describes what he calls a “flow-through” synapse (making direct connection with a neuron in the case of the invertebrate preparation or on a dendritic spine in the mammalian preparation). This synapse carries the signal that acts as a deforming stimulus to the receiving neuron. That is, it causes the neuron to unconditionally fire. Other synapses from non-deforming stimuli (sensors) lie in close proximity to the flow-through synapse. These synapses are potentiated (post-synaptically) by stimulation, but they do not, in turn, cause the neuron to fire. Their efficacy is too weak initially. However, on repeated exposure to stimulation, followed by stimulation at the flow-through synapse, the non-flow-through synapses become hyper-potentiated with a longer-term memory trace. Subsequent stimulation of these synapses actually generates a response in the neuron as if it had been stimulated by the flow-through signal. Thus, the non-deforming stimulus comes to represent the deforming stimulus as a causal model of the environment. Encounter with the non-deforming stimulus will cause the neuron to preemptively activate. The system responds as if it had been deformed, but no (or minimal) deforming damage had actually been experienced.
In Graph 3, above, we see the dramatic impact of cue-based anticipation. In response to a cue stimulus, the system preemptively acts, effectively damping the stimulus before it even occurs. This results in a similarly dramatic decrease in the cost of deformation, CD, and leads to substantially reduced total cost. Causal-associative encoding mechanisms such as this provide a considerable improvement in cost reduction and, by implication, increase in survivability for autopoietic systems. It is not, therefore, surprising that this kind of mechanism is at the heart of neural adaptation, particularly in the synaptic junctions between neurons. That, in turn, is at the heart of learning, memory and control in brain networks.
The complete adaptrode mechanism emulates the adaptive response of a synapse that is receiving signal from a non-deforming factor sensor, gated by the receipt, within a short time span afterward, of signals from a deforming factor sensor. The temporally ordered gating ensures that only preceding events are encoded as associated with deforming events. Subsequent receipt of the non-deforming signal acts as a predictor for the onset of the deforming factor. Thus, an adaptrode-based neuron can be used to activate a response before the actual deforming factor has an influence on the system (Mobus, 1994a, b).
Food Finding and Anticipatory Autopoiesis
A reduction in internal stores of energy and materials clearly impacts a homeostat within the autopoietic system. It sets up a drive to replenish as the response. The animal goes into a search mode of activity. Once food is found and consumed/digested balance is restored.
The cost of regaining balance is essentially the cost of conducting a search and consuming the food. The animal must move through its environment and thus diminish its stores even further. The longer it is searching the higher and more damaging the cost accrued. An unintelligent search could be conducted by either systematic coverage, say moving in an outward spiral from a starting center point, or a random search. Both approaches would eventually provide exhaustive coverage of the area but over extended time. A control system for doing systematic coverage search is problematic (e.g., something like a breadth-first search). A random search is guaranteed to cover the space but only in infinite time. Neither of these methods would be appropriate for sparsely distributed occurrences of food resources.
An intelligent search, on the other hand, uses associative (causal) adaptation to capture structural information about the environment. This comes in the form of cue events (non-deforming stimuli) that are defined as events which precede deforming ones. In this context the occurrence of food acts as a deforming event in that the sensory input directly from the food (say olfaction) is sent to flow-through synapses on neurons directly involved in invoking orientation toward the source and eventually feeding response. A cue event is some non-deforming stimulus, which occurs just prior to finding food. The synapse of this stimulus sensor onto the response activating neuron is potentiated by the cue event. This is followed shortly by encounter with the food stimulus which, acting through the flow-through synapse, gates the memory trace of the cue synapse into an intermediate or longer-term memory trace (Alkon, 1987; Mobus, 1994a). After several such co-encounters with the appropriate time orientation, the cue is learned and subsequent encounters with the cue will invoke the orientation reaction. The cue becomes a predictor of the location of food. And the search switches from undirected to directed mode. It is relatively easy to see how mechanisms of classical conditioning could be implicated in this wild behavior.
Various studies of foraging behavior in hunting animals (those that are faced with sparse, dynamic and uncertain distribution of food) have shown that the pattern of search has certain large-scale features (Garber & Hannon, 1993; Olton, et. al., 1981; Menzel & Wyers, 1981; Rashotte, et. al., 1987). These features are conserved across the phyla regardless of the details of any specific animal's techniques. An animal starts with an undirected search which may either result in the finding of food or finding of a previously learned cue. In the latter case, the animal switches to directed-search, in which it preferentially follows the cue. By definition, the cue should lead to finding food most of the time. If it does not, eventually it must lose its status as a cue. The animal forgets the association and no longer responds to the event by orienting on it. On the other hand, if the cue reliably leads to food then the association is reinforced and the cue event maintains its status. This pattern is observed in animals as diverse as snails and cats.
The undirected search phase is often characterized by a mode of exploration that is not a random walk, but is not systematic either. Often time the animal makes gross forward progress in some direction, but this is punctuated by excursions, left and right, which are neither periodic nor predictable statistically. At best they may be characterized as pink noise (or 1/f noise). Deviations from a straight line may have many small amplitude episodes, a few medium amplitude episodes and a very few large amplitude episodes. The assumed purpose of these deviations is to provide novelty in the search without allowing it to become random. The animal should not cover the same ground twice as would often happen in a random walk.
Once a cue is detected, the animal orients on the cue and travels in the direction of the cue. This may be as simple as following an odor gradient by temporal difference, or as complex as tracking an image. Multiple cues or a chain of cues may need to be followed before finding the food.
The literature on artificial neural network (ANN) models of classical conditioning is extensive (c.f. Commons, et. al., 1991). With few exceptions, these models employ networks that were inspired by pattern learning. Such methods rely on gradient descent (or hill climbing) and convergence criteria. They derive from optimal pattern classifier or recognizer methods. They depend on the inherent stationarity of the statistical properties of a pattern — the relations between pattern elements over time. The whole notion of optimality is based on the often unmentioned assumption that the world is fundamentally ergodic. But is this a fair and reasonable assumption? In fact there is growing awareness among a number of theoreticians that the world is decidedly not ergodic, though it is obviously also not random (c.f., Kauffman, 1996).
The main question that must be addressed is: Is the environment (ambiance) of an agent stationary over the life of the agent? If it is, then pattern-learning-based approaches to implementing conditioning are possibly preferred since they generally provide a high level of confidence in the recognition/classification task. There is still the problem with the amount of time needed to learn associations. It can not be done in real time. But then once one agent is trained, in a stationary environment, that learning can be copied to other agents.
The real world (the natural world) does not seem to be stationary — a sufficient condition for non-ergodicity. Put in the framework of foraging one can find examples of cue-food associations that fluctuate over the long run (but within the lifetime of a single individual). Cue events that signaled food availability for some period of time might become less reliable later. Real environments are themselves open to extraneous influences (witness the events affecting local ecosystems following El Nino). New cue events may supplant old ones. Or old food sources (prey or plants) may dwindle and newer or alternate sources may succeed the old ones. This is precisely why an individual must be capable of learning and adapting to new situations. This raises serious problems for most ANN approaches to conditioned learning (indeed it is a problem for most machine learning approaches). Learning new relations tends to destroy the old ones. This would not be a problem if the new situation to be learned would hold stable for a long time-and the old situation was never experienced again. However, this can not be guaranteed. And in fact, there are a number of cases in which an older cue-food association reverts. Animals display an ability to not completely forget old deeply encoded associations-called savings in the conditioning literature-they are merely masked by newer association learning. The problem of destructive interference never seems to occur in living systems.
The adaptrode mechanism has been shown to demonstrate savings in conditioning simulations (Mobus, 1994a). This ability is the key to survival in a non-stationary world. Agents may retain a non-zero expectation of associated events over the course of what may turn out to be a short-term period when the association no longer seems to hold.
MAVRIC: A Food Finding Robot
The first MAVRIC (Mobile Autonomous Vehicle for Research in Intelligent Control) robot was a home-brew Braitenberg vehicle (Braitenberg, 1984) which I built as part of my Ph.D. research. MAVRIC had a “brain” composed completely of adaptrode-based neurons (Mobus, 1994c, d). The objective of this project was to demonstrate the capacity to learn contingent environmental cues (lights and sounds) which lead to food (reward) or signaled danger (punishment). I characterized MAVRIC as having the intelligence of a moronic snail and based the model on the work of Alkon, et. al. (Alkon, 1987; Alkon, et. al. 1990) on the marine nudibranch, Hermissenda crassicornis.
MAVRIC employed a central pattern generator (CPG) neural oscillator circuit which in open mode caused the robot to wander in the manner described above — what one might call a “drunken-sailor walk” (Mobus, 1994d) shown in Fig. 2. Its brain also consisted of associator neurons for learning relations between rewarding or punishing events (equivalent to a homeostasis deforming factor) and the sensory stimulus events which preceded them. Output from these associators was routed to the CPG in a manner that caused MAVRIC to turn right or left, go forward or reverse depending on whether the stimulus was to the left or right (or in front) of the robot and whether reward or punishment had been activated. The associator neurons thus caused the robot to approach stimuli that were associated with reward and flee from those associated with punishment. It learned the cue events and in subsequent explorations, could use those cues to modulate its search pattern and follow stimulus gradients, moving up or down the gradient depending on whether it was associated with pleasure or pain.
Figure 2. Some typical search paths followed by the MAVRIC robot show the nature of a drunken-sailor walk. Each search is started from a given position (home). The robot never traverses the same path twice. At the same time, the search paths are not random, there is some general progress away from the home base and no doubling back. The smooth curving lines represent an envelope that contains about 80% of the search tracks. We suspect these lines diverge parabolically as the distance from 'Home' increases. This search path looks something like a continuous version of a randomized depth-first search in a graph. The control for this path generation obtains from a central pattern generator neural circuit.
The major contribution of the MAVRIC work was to show how a robot could learn associations that changed over varying time scales without destructive interference. MAVRIC could learn a long-term association between light and a specific tone as meaning reward. It would then approach the light and follow the tone to reach the resource (food). Later we changed the association so that it represented punishment. That is, when the robot followed its learned behavior of approach it received a punishment result instead of the “usual” reward. In subsequent trials the robot avoided that light/tone combination. But since the encounter did not punish it (even a little) this short-term memory faded. Due to savings, the longer-term association and behavior of approaching the combination re-emerged eventually and the robot resumed its ability to efficiently search for reward. The amount of time needed to resume its behavior was far less than would have been required by any other learning mechanism that suffers from destructive interference.
The combination of savings and forgetting in adaptrode-based neural architectures gave MAVRIC the capacity to succeed in finding resources and avoiding threats in a non-stationary environment.
Steps to a Formal Framework for Foraging Search
We are in the early stages of developing a formal framework in which to study foraging search. Search is, of course, a central part of the study of artificial intelligence. The difference between search as studied in that discipline and the approach suggested here is that in AI the search space is represented within a fixed computer memory as a graph (or tree). Either the vertices and edges of the graph are represented in full (or explicitly), as in an adjacency matrix, or each vertex is generated from a fixed set of rules on an as needed basis (implicitly). In the latter case, the graph is virtual. In the present context of a robot brain, only some small portion of the graph must be discovered in real-time. In a non-stationary, dynamic environment there is no one single graph that can represent the environment. It is a difference in the perspective taken. In the graph representation approach, one has the perspective of a god-like observer, able to see the global picture, or generate it as needed, and it assumes that the set of all possible world states is essentially fixed. The agent's perspective is a greatly reduced subset of the possible world, and, indeed it is a substantial subset of the realized world at any given time.
In the formal approach we propose to model an environment by the distribution of resources and threats and the causal relations they have with neutral objects. We can analyze search effectiveness in terms of the cost model described above under conditions in which we vary the strength of causal relations in non-stationary ways. This framework is somewhat akin to the Optimal Foraging model used in the analysis of animal foraging (Collier & Rovee-Collier, 1981). It is not meant to completely represent actual search in the real world. It only seeks to provide a common framework from which comparisons of search can be made in a simulated non-stationary world.
Food for Thought
Finding sparsely distributed food while avoiding threats in a dynamic, complex and, most importantly, a non-stationary environment is the most fundamental requirement of an autopoietic system. The role of learning, memory and intelligence in adding anticipatory processing to an autopoietic core can be seen to improve the efficiency of autopoiesis and hence would be selected for in evolution. Greater efficiency in food-finding means there is more energy left over for mating, resulting in higher fecundity.
My central thesis then can be summed up thus: Intelligence is for finding resources (under the above conditions) while avoiding death. It is accomplished by encoding relations between non-homeostatic impacting but causally related events to those that directly impact the homeostatic core of an individual. Such causal relations, captured in models implemented in neurally based graph structures, are the mechanism by which all other accessible spatio-temporal patterns are grounded in meaning. A neural control architecture for foraging search, then, constitutes the ‘engine’ of reasoning.
Now permit me a leap of conjecture. Start with a neural architecture that encodes patterns and relations between patterns (as memory traces) using the same multi-time domain method represented in the adaptrode. The architecture is hierarchical, that is, low-level features, collectively activated, activate higher-level constructs. The levels are set out in two-dimensional maps. Let us call such architecture a cerebrum. Then consider a core foraging engine as represented primitively by MAVRIC's brain (effectively a limbic system), that receives inputs from both sensors and from the cerebrum and provides output to motors and/or back to the cerebrum. The motor outputs to the cerebrum, rather than causing locomotion in the physical world, drive a focus-of-activation through the two dimensional cortical structure in exactly the same way that MAVRIC moves through its environment. When unmodulated by encounter with cues, the activation pattern wanders in a drunken sailor walk through the cortical map. As it does so, it activates patterns that have been encoded there. In this scheme, activation of a pattern corresponds to working memory insertion. The activated pattern, if it is causally associated with a motivational pattern, that is, it is a cue, will activate the latter, bringing it to the level of awareness. In turn, the motivational pattern activates actual motor responses and the agent behaves. Searching through cerebral cortex - thinking - is actually foraging, but in the space of encoded patterns.
A reasonable question is: How are these patterns related to motivation in the same way that food or danger is motivating for learning cues in physical space?
We know that as we develop we learn first patterns that are grounded in our homeostatic core (mediated by the limbic system). What visual objects, smells, sounds do we encounter just before we are fed, or our diapers changed? Those patterns are the objects in our world that restore us to balance (face and smell of mother) and they must be deeply encoded (under limbic control) in the cortex. As we grow older, our environment enlarges. What patterns do we encounter just prior to encountering the patterns associated with our homeostatic core? That place over there is home; that's where mommy is; that's where I will be taken care of. A chain of causal relations from an increasingly abstract world is created that lead back to the most fundamental core of our existence.
This process of expanding environments and forming layers of causal relations as we develop is essentially open-ended (recent discoveries of neurogenisis in mammalian brains lend added impetus to this conjecture). But at each layer we will find ourselves exposed to patterns that generate dissonance in some way. The environment, remember, is non-stationary on all time and space scales. Pattern dissonance cannot be avoided! This dissonance amounts to a problem. Some patterns do not fit easily into our growing model of the world. Because all patterns are ultimately grounded through the lower layers we experience what amounts to homeostatic imbalance and a resultant drive to find the resource that will restore that balance. The search for a solution (a set of nearly isomorphic patterns with nearly isomorphic causal relations) is a foraging activity conducted in mental space. At first it may consist of 1/f wandering (hypothesis: we can find central pattern generator circuits affecting pre-attentional processing in the cortex) before we have any clues from the environment of patterns. We experience this as the flux of seemingly random thoughts that rise to conscious awareness. Perhaps dreaming is a state of consciousness in which this kind of processing takes place. Sometimes we find ourselves engrossed in observing something in the external environment that doesn't seem connected to anything we are consciously working on.
But what are we looking for? What is a solution. Above, I called it “a set of nearly isomorphic patterns with nearly isomorphic causal relations”. By this I mean an analogous meta-pattern. I would suggest that these have object status in mental space. Patterns of concurrent sensory inputs (objects in the environment) are linked by neurons coding causal relations between patterns (temporal patterns — which objects interact with which). Together, sets of object patterns, along with causal relation encoding, constitute a meta-pattern. I suggest these will have the kinds of coherence characteristics that allow them to be manipulated as a conceptual whole, as if it were an object pattern itself. Call these meta-patterns concepts.
What about response and action — motor outputs? What form do these take in foraging for solutions? Consider a foraging animal that has learned to turn over rocks or dead leaves on the ground in search of grubs. The animal has to manipulate the environment as part of its search process (not just move from place to place). The rock is a cue. It follows the cue by picking up the rock. It changes the configuration of the cue and environment in the process of foraging. It develops manipulative facilities for accomplishing this. Now consider a concept as described above. It is just a set of encoded causal relations (spatial and temporal). Suppose we could recruit new neurons to code for modified relations between object patterns, or even to change the shape of an object in a form of object manipulation. That is, we could create new association encoding to see if such manipulations brought the meta-pattern concept closer to matching a problem. Thus motor outputs for manipulating concepts allow us to test variations on the theme of those concepts and discover new relations.
I submit that the same machinery needed to control foraging search in physical space can be reused to conduct search in mental space. Further, the manipulative capabilities, needed for killing, routing or feeding, can be reused to manipulate concepts-patterns encoded in the brain. The hallmark of intelligence, the capacity to think, is grounded in the capacity to forage and stay alive.
Alkon, D.L., (1987). Memory Traces in the Brain, Cambridge, MA: Cambridge University Press.
Alkon, D.L., Blackwell, K.T., Barbour, G.S., Rigler, A.K. & Vogl, T.P., (1990). Pattern Recognition by an Artificial Network Derived from Biological Neuronal Systems. Biological Cybernetics: 62, pp 363-376.
Braitenberg, V. (1984). Vehicles: Experiments in Synthetic Psychology. Boston MA: The MIT Press.
Collier, G.H. & Rovee-Collier, C.K. (1981). A comparative analysis of optimal foraging behavior: laboratroy simulations. in Kamil, A.C. & Sargent, T.D. (Eds.), Foraging Behavior, Chptr. 3, pp 39-76, New York: Garland STPM Press.
Commons, M.L., Grossberg, S. & Staddon, J.E.R., Editors (1991). Neural Network Models of Conditioning and Action. Hillsdale, NJ: Lawrence Erlbaum Associates.
Dubois, D.M. (1998). Introduction to Computing Anticipatory Systems. International Journal of Computing Anticipatory Systems, Vol. 2, pp 3-23, Liege, Belgium, CHAOS absl.
Garber, P.A. & Hannon, B., (1993). Modeling monkeys: a comparison of computer-generated and naturally occurring foraging patterns in two species of neotropical primates. International Journal of Primatology, 14:6, pp. 827-852
Kauffman, S.A. (1996). Investigations: The Nature of Autonomous Agents
and the Worlds They Mutually Create. (a WWW resource), Santa Fe Institute.
Maturana, H.R. & Varela, F.J. (1980). Autopoiesis and Cognition. Boston, MA. D. Reidel Publishing Company.
Mensel, E.W. & Wyers, E.J. (1981). Cognitive aspects of foraging behavior. in Kamil, A.C. & Sargent, T.D. (Eds.), Foraging Behavior, Chptr. 16, pp. 355-377, New York: Garland STPM Press.
Mobus, G.E., (1994a). A Multi-time scale learning mechanism for neuromimic processing, Ph.D. Dissertation: University of North Texas, unpublished.
Mobus, G.E., (1994b). Toward a Theory of Learning and Representing Causal
Mobus, G.E. & Fisher, P.S., (1994c). MAVRIC's Brain, Proc. Seventh
Intl. Conf: Industrial and Engineering Applications of Artificial Intelligence
and Expert Systems, pp 315-322, Gordon and Breach Science Publishers.
Mobus, G.E. & Fisher, P.S., (1999d). Foraging Search at the
Edge of Chaos, in D.S. Levine, V. Brown & R. Shirey (Eds) Oscillations
in Neural Networks, Hillsdale, NJ. Lawerence Erlbaum Associates.
Olton, D.S., Handlemann, G.E. & Walker, J.A., (1981). Spatial memory and food searching strategies. in Kamil, A.C. & Sargent, T.D. (Eds.), Foraging Behavior, Chptr. 15, pp. 333-354, New York: Garland STMP Press.
Rashotte, M.E., O'Connell, J.M. & Djuric, V.J. (1987). Mechanisms of signal-controlled foraging behavior. in Commons, M.L., Kacelnik, A. & Shettleworth, S.J. (Eds.), Quantitative Analysis of Behavior: Foraging, Chptr. 8, pp. 153-179, Hillsdale, NJ: Lawerence Erlbaum Associates.
Rosen, R. (1985). Anticipatory Systems. Oxford: Pergamon Press.