Initially an agent knows of only resources, threats and neutral objects. Resources are assumed to be sparsely distributed in the set of local environments. Furthermore, object content and path structures are assumed to be dynamically altered on many different time scales. The agent effects this dynamic in that it consumes resources it finds (the resource may be regenerated at a later time). Thus, the agent will always be in a state of ignorance with respect to the actual location and timing of resources. However, we also assume that the dynamics of the space operate in accordance with law-like or propensities of nature such that there is some consistency to the operation of the world of the agent. Therefore, an agent that can learn the causal relations between initially neutral objects and resource or threat objects can exploit the occurance of the cues to find and consume the resources and avoid threats. An agent that can learn a chain of causal relations can effect a more directed search and improve its chances of finding resources.
Foraging search is an attempt to emulate animals that solve the above problem in hunting for food, water, mates and other resources. The algorithm depends on the agent's ability to partition the set of neutral objects into the cue objects and still neutral objects and further partition the former into resource and threat cues. One must assume that there are sufficiently many objects that are causally related to the resources such that the agent which successfully learns, even a subset of these objects, improves its chances of success in finding resources.
Algorithm
In the below, typical keywords (IF, ELSE, etc.) are set in all caps.
Functions are given by identifiers with initial capital letters.
And variables (broadly interpreted) are in lowercase.
Forage
DO forever
WHILE hunger > threshold
WHILE no cues or resources or threats
in local environment
Wander
select path based
on direction generated from Wander
Decay cue memory
traces
Record local environment
- path-associated objects - in short-term trace
proceed to next
local environment
ENDWHILE no cues
IF cue = threat-associated
OR threat detected
TurnAndRun
ELSE IF cue = resource-associated
select path leading
to cue // or suggested by location of cue
Record local environment
- path associated objects - in short-term trace
ELSE IF resource found
Consume resource
Learn cue chain
// reinforce intermediate-term memory
ELSE
Decay cue memory
traces
ENDELSE
ENDWHILE hungry
Follow pheromone trail home
IF home Learn cue chains // reinforce long-term memory
Conduct other activities // internal loop
ENDDO forever
This algorithm assumes that a mission is to be carried out iterratively. That is, a search for resources must be conducted periodically, conditioned by an internal motivational variable, here indicated as hunger. It is assumed that consumption of the resource will lower hunger and that the extent of lowering is coupled with some quality parameter of the resourse.
The algorithm is an infinite loop. As the agent conducts other activites it depletes its stores of energy (or some analogous variable level). When energy falls (hunger rises) below (above) some threshold, the agent is motivated to leave its home and start searching for resources.
It enters the third loop and stays there as long as it does not detect a cue object in its local environment. The Wander algorithm is described below.
Wander
STATIC last_direction, last_deviation, last_sign,
cumulative_deviation
sign = exponDist(cumulative_deviation) // sign = +1
(pos) for right of center
-1 (neg) for left
IF sign <> last_sign
cumulative_deviation = 0
last_sign = sign
ENDIF
deviation = exponDist(last_deviation) // low probability
of high deviation magnitude
last_deviation = deviation plus influence of any biases
// biases are global variables
cumulative_deviation = cumulative_deviation + deviation
last_direction = last_direction + (last_sign * deviation)
Wander keeps track of the current direction (which may also be a net difference between two motor speed variables), the last deviation (change in direction or speed difference), a cumulative deviation in a given direction, and the sign of the previous deviations. As the agent wanders it will tend to curve off to one side or another until the cumulative deviation in one direction (right or left) is high. As the latter increases, it increases the probability of changing signs and reversing directions. Overall the chances of a large deviation are small, while those for a small deviation are large. Thus, the amount of change in direction from one call to wander to the next is usually small and stochastic. This approximates a pink-noise or 1/f, power law, distribution. The end result is the drunken-sailor walk as described in [Mobus & Fisher, 1999]. Such a walk approximates a randomized depth-first search.
The drunken-sailor walk can be biased such that the path takes on low amplitude deviations -- becomes closer to a straight line. Or it may be biased toward the right or toward the left of center. These biases are handled by global variables that are set by recognition processes. So if an object on the left is "familiar", e.g. a potential cue, not yet elevated to the status of actual cue, the agent will preferentially be drawn in that direction. See the discussion of interfacing the central pattern generator-based version of Wander in the MAVRIC robot in [Mobus & Fisher, 1999].
Every potential cue recognizer is linked to every resource or threat associator. A memory trace is the relative strength of the link when a cue is activating. Unlike traditional neural networks, the memory trace does not involve just one efficacy or weight value. Rather, the efficacy is based on short-term, intermediate-term and long-term support terms as given in the adaptrode model [Mobus, 1994]. With each iteration, the strength of cue-resource (or cue-threat) link is decayed according to the adaptrode formulation. This prevents transitory associations from being retained over longer times.
Detection of a cue associated with a harmful consequence (threat) results in a pre-programed response called TurnAndRun. The agent executes a stylized escape behavior before returning to the Wander loop.
In the event that a resource cue is detected, the agent preferentially
selects the path associated with the cue. This preference may be
stochastic with the probability of selection dependent on the strenght
of the adaptrode support. Once a path is selected the agent records
all objects associated with (in the neighborhood of) the selected path.
These objects are potential cues It then follows the selected path and
upon arriving at the next local environment .
Suppose the agent selects the central path. The trace representing the w3 association will be mildly reinforced in short-term memory before the agent proceeds along that path to the next node. The traces for w1, w15 and w16 will be decayed somewhat moreso than that for w3 as a result. Upon arrival at the node (marked with the "k3" label), the agent scans the environment. In this case the node is a leaf or terminal node. It contains one of the search resources, the k3 label. Since only one of several possible resources is present (and the use of just one item is an indication of a less useful amount of resource, e.g., a low page salience measure for a Web page) the energy value of the node is low, but sufficient to transfer the memory trace of the w3 object into intermediate-term memory through the adaptrode reward reinforcement mechanism. The w3 object becomes a potential cue for k3 to the agent.
Since the node is a terminal, the agent backtracks to the central node in the figure. It scans the remaining paths. The path just taken is eliminated from further consideration since it has been visited and the resource consumed. Suppose the Wander algorithm selects the upper path, due to the influence of the weak but still present w1 object trace. It will find a richer set of resources (more energy) and will tend to boost the value of w1's trace, also transferring it into intermediate-term memory. Now both w1 and w3 have stronger potential cue status. By being in intermediate-term memory, both will be protected from too much decay. w15 and w16, on the other hand, will both be so decayed that they can be eliminated from memory altogether.
When the agent found both nodes to contain resources, it "consumed" them. In the case of a Web search agent, this means returning the URL to the user, for further evaluation. The latter serves the purpose of providing confirmation to the adaptrode algorithm in the same way that the agent's successful return to its home (as in the case of the MAVRIC robot) provides. The confirmation of value will cause the memory traces for w1 and w3 to be transfered to long-term memory if both nodes were deemed sufficiently valuable.
Upon return from the second node visited, the agent may or may not choose to take the remaining out-path. The entry path is also a valid out-path - for backtracking. By the time the agent has taken several out-paths and returned from terminal nodes (or box canyons) it has significantly altered its direction so that the entry path becomes more probable as an exit path (the entry path, when used for exit, does not require scanning). Thus there is some likelihood that not all out-paths from a node will be selected, particularly if there are no cues or candidate cues associated with the path. In this one way, this algorithm differs from traditional (graph) search. There is no necessary attempt to do an exhaustive search of all reachable nodes.
The philosophy behind this approach is that in a truly vast space of possible nodes, it would be impossible and impractical for the agent to search everywhere. The possibility exists for the agent to get stuck for a longer than optimal time in a dense portion of the graph that does not contain any resource. This would cause the agent to loose excessive energy and forget possible cues. In fact, in one variation of the above algorithm, the agent gets more frustrated with each node it searches that it doesn't find a resource. As the level of frustration increases, the distribution parameter of the pink-noise generator (mu in an inverse exponential distribution) is adjusted such that the number of out-paths in a node that are likely to be searched is reduced. The agent then has a stronger bias toward backtracking in order to cover a "broader" territory. In one sense, the agent starts approaching a more breadth-first kind of search.
In the above scenario for Figure 1, the objects w1 and w3 were learned as possible cues. The learning, however, given only one instance of a connection between them and the various resource objects, is weak. And even though the traces are advanced to intermediate- and even long-term memory, the low levels would still allow them to decay over many non-occurances of the conjunction (temporally ordered) of the cues and resources.
What must happen is that the agent encounter additional such co-occurances in some reasonable time (in the case of the Web agent this might mean within the next several thousand page hits). Each such encounter further reinforces the strength of connection between the cue and the resource recognizer. Eventually, only those cues which prove to be useful in predicting the discovery of resources will be retained for the very long-term.
As an agent builds a set of reliable cue predictors its search efficiency
increases. That is, it becomes better at finding resources (and avoiding
threats) by virtue of using its cue-based way-finding method to locate
resources. This, of course depends upon the richness of unique objects
and causal relations between them and the resources (threats). If
there are too few objects which are causally related to the resources,
and if the distribution of resources and cues are sparse, then this approach
will not prove effective. However, when one considers the richness
of the real natural world then it seems quite reasonable to suspect that
foraging search will provide a useful method for agent search.