MAVRIC - Experimental Setup

Motivation

We are interested in how animals, particularly, hunters, find stochastically and sparsely distributed food (resources) in a vast, dynamic and nonstationary world. It has been our contention that this problem is paradigmatic. That is, it represents the most fundamental behavior required of an intelligent agent. It therefore behooves the biomimetics, animatics community to explore this behavior more diligently. The MAVRIC work is directed toward this study.

The task set before MAVRIC is to find resource objects in a large, open world while avoiding dangerous threats. Our working hypothesis is that animals learn to associate certain causally correlated cue events with the nearness of the sought object. The agent can then orient itself toward the cue so as to improve its chances of finding the resource. Similarly, an agent must learn cue events that are associated with threats (things which cause pain) so that it may use those cues to orient and move away from the threat. Prior to learning the agent is naive about the causal nature of the environment and will search using a novelty-generating mechanism. During this period the agent is also in danger of falling victim to threats. It is incumbent on the agent to learn quickly.

Every time the agent encounters a resource, it needs to learn the cue events that immediately preceded the resource event. However, it can not learn any such events indiscriminately as some such events may be incidental and not truly causally connected with the presence of a resource. Thus learning must be somewhat gradual. The depth of learning a relation should be based on multiple occurances so as to be considered a reliable cue. The adaptrode learning mechanism is well suited to this need. It provides for fast encoding, in a short-term trace, with longer-term encoding following only after numerous co-encounters of cue followed by resource. This means that the agent will begin behaving by following presumptive cues after only one or two encounters. It will, however, forget cue pretenders that fail to be reinforced over time.

A necessary characteristic of a cue that makes it a reliable predictor of the presence of a resource (aside from being consistent) is that it will provide a stimulus to the agent a short time interval prior to the agent detecting the actual resource itself. That is, the cue must be detectable at a greater distance than the resource in order to be useful. In our experimental setup we use two major stimulus sources to provide for both resourse (and threat) stimulus and cue stimulus. These are light and sound.

Stimuli

The two stimuli sources are implemented in battery-powered, standalone towers that are radio activated. These towers carry either a speaker or a light source (Figure 1). We have ten of each type. These can be set out on the gym floor (see below) in groups to create an object having two, three or four features (three sound and one light).

Figure 1. Two types of stimulus towers are used to create the environment for MAVRIC, a sound tower (left) and a light tower. Both types are under radio control. Both towers provide for intensity (loudness or brightness) control, but the sound towers also support frequency (tone) selection as well.

MAVRIC can detect brighter-than-background light sources at a long distance (about six to eight meters). Sounds can be detected at a shorter range by turning the sound on only when the robot is in the right range. In figure 2 blue circles represent neutral tones (simulated odors), green represents the food tone, yellow a light and red represents a poison source. Colored annular rings around each source represent the gradient fields of the stimulus emitted by each tower. MAVRIC must learn to maneuver using these gradients. The figure depicts MAVRIC's movement through the search space (puple line = history, green line = best path). Note how the towers can be grouped to provide the necessary featural information needed for MAVRIC to disambiguate situations.

Figure 2. Each tower emits stimuli that have a gradient field. Light can be detected from the furthest distance. Mavric learns to negotiate the various fields in seeking food and avoiding poison.

All towers are under control of a 'world' computer program which performs monitoring of the robot and activates/deactivates sound towers according to the robot's position. By stepping up/down the volume on towers we can enhance the gradient effect detectable by MAVRIC's fairly crude sensors. The world computer also keeps a history of the robot's movement as well as generates (specifies) the location and combination of new towers as the world scrolls (see below) by. A camera, mounted on a walkway above the main gymnasium floor is the input to the world program. The latter program and the robot brain program are running on two completely separated computers. The world reacts to MAVRIC in terms of controlling the field dynamics. At the same time MAVRIC reacts to the world through its physical sensing and behavior.

Space

The space that MAVRIC searches is essentially an infinite plain. This is accomplished by using a gymnasium floor as the task arena. A fifteen meter by fifteen meter square in the center of the floor forms the actual search area in which MAVRIC operates. Surrounding this square are eight additional squares representing the potential areas into which MAVRIC could wander. Figure 3 shows an experimental arrangement. The central (green) square is the operating area for the robot. Towers are distributed in all areas that are potential paths for the robot (four additional green areas). Blue areas represent those which MAVRIC has wandered through previously. The nine areas represent the world that MAVRIC could have sensory access to while it is moving in the central area.

Figure 3. MAVRIC's operating environment (45 m X 45 m) represents the accessible world for the robot. The robot only operates in the central square. The text explains what happens when MAVRIC comes to an edge.

Whenever MAVRIC reaches an edge of the central square the brain program is halted (the robot's state is preserved in 'hibernation'). All squares are shifted toward MAVRIC's history (some history drops off the grid) and new potential world is generated. The robot is moved to the contralateral postion in the central square - as if it were just entering the square - oriented appropriately. In essence, the world is scrolled around the robot and new world is continually generated in the potential paths that it could take. In this fashion, MAVRIC is unconstrained by artificial boundaries. This does require the research assistants to hustle to setup the towers however!

Naive Search

When MAVRIC is first started it is ignorant of which stimuli will be associated with the resource tone and which might be associated with the poison tone. During this phase of its life it conducts a naive search of the environment. Its brain will react automatically to obtain a resource or escape from a threat if it accidentally runs into either situation while it wanders through the environment. The wandering behavior is the drunken sailor walk described in Mobus & Fisher (1994). So long as the robot has not formed intermediate-term memories of cue events that might lead to resources (or help avoidance of threats) it will continue this wandering behavior simply reacting to the primary stimuli of food or poison.

Learning

Each time MAVRIC encounters a resource (or threat), it forms a short-term memory trace of any other sensory stimuli that immediately preceded the encounter. It does not form memories of co-occurring events or subsequent events due to its temporal bias (see Adaptrode processing). The short-term memory traces are effectively eligibility traces for what may eventually become a long-term memory. Thus any stimulus which immediately preceded the resource stimulus event is eligible to become a cue.

It takes MAVRIC about three or four such events, if the co-occurrences are consistent, to begin forming a sufficiently deep memory trace of the relationship such that it begins reacting to the cue as if it were the resource (or threat) itself. With additional co-occurrences the memory trace is further strengthened. However, there are some additional details that are needed to explain learning completely.

We will take a closer look at the sequence of events which support learning a cue stimulus for food. When MAVRIC comes into contact (sonar short-range detection) with an object it is mildly repelled by the object (though close contact actually activates the sense of pain) unless, simultaneously it is detecting the food tone. In the latter case, MAVRIC assumes the object is food and halts to 'feed'. Lets assume that prior to encountering the object, MAVRIC detected a light and some neutral tone (odor). Once the food tone (odor) is detected MAVRIC starts encoding the associations between light and the neutral tone in short-term memory traces. Upon halting, and as long as the food tone sounds, MAVRIC will 'assimilate' the food. That is it will increment a register representing the contents of a simulated stomach. The feeding act will cause this value to increase and provide a form of reward feedback to the 'Seek' associator neuron (see MAVRIC Brain Specification). This feedback gates, marginally, the short-term trace into an intermediate-term trace. Over a longer time scale, the accumulation of food in the stomach causes the increase in 'Energy' through a process of simulated digestion. As energy increases, this provides yet further feedback (confirmation) to the 'Seek' neuron gating the eligibility traces from intermediate-form into long-term traces, again marginally. In other words, MAVRIC gets several levels of evaluative feedback, over longer time scales, to control the degree to which the association (eligibility) traces are gated into a long-term form. If not much food is gained by the feeding process (the tone is not left on for very long - representing a poor food source) then not much gating will occur so that the trace will not be very strong. This is the right result since the presumptive cue stimuli might not provide predictive information about 'rich' food sources. The agent does not want to follow cues to poor food sources if other cues (to rich sources) would produce a better result.

Thus MAVRIC requires several conditions to be met to 'learn' the relationships between stimuli and the availability of food (or presence of a threat). First there need to be one or more stimulus events which precede the food stimulus by some small time increment. Second the food intake must be 'adequate' or sufficient to confirm the presumptive cue(s) as predictive. Third, the co-occurrences must be repeated, that is reinforced over time. Only after several such events will the robot begin to treat the cue as a reliable predictor.

Cue-based Search

Once reliable cues have been learned MAVRIC changes its search process. When not being stimulated in some field, MAVRIC wanders as in the naive search mode. But once it enters the field of a reliable cue stimulus, MAVRIC will follow the cue with the expectation that it will lead to resource. Thus MAVRIC adopts something like a heuristic search procedure when cues are available. With cue-based search, MAVRIC should become more successful at finding resources (without succumbing to threats) than when it was naive.

Of course quite a bit depends on our assumptions regarding the density and distribution of resource 'patches' in the environment. In actuality, we are modelling natural distributions of food patches which tend to be distributed in what can be described as a fractal manner. That is there may be large-scale aggregations of patches. These might be aggregated into even larger patches. Figure 4 shows two patches, somewhat closely spaced with considerable empty space otherwise.

Figure 4. Resources may be fractally distribute, small patches aggregated into larger patches. MAVRIC operates in the central square. See text for explanation of search task.

In this scenario, MAVRIC should discover and exploit the nearer patch (note food tone - green circle - may actually be sounded when MAVRIC touches any of the combined light/sound objects) before moving on to the farther patch. We are interested in modelling the neural controls that would cause MAVRIC to explore the near neighborhood of the exploited patch before going back to a complete wander behavior. This kind of behavior, observed in real animals, ensures the most efficient exploitation of a region.

We will soon be setting these tasks for MAVRIC to tackle. Stay tuned to this page for updates on results as we run the experiments!.

This material is based upon work supported by the National Science Foundation under Grant No. IIS-9907102.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.