The task set before MAVRIC is to find resource objects in a large, open world while avoiding dangerous threats. Our working hypothesis is that animals learn to associate certain causally correlated cue events with the nearness of the sought object. The agent can then orient itself toward the cue so as to improve its chances of finding the resource. Similarly, an agent must learn cue events that are associated with threats (things which cause pain) so that it may use those cues to orient and move away from the threat. Prior to learning the agent is naive about the causal nature of the environment and will search using a novelty-generating mechanism. During this period the agent is also in danger of falling victim to threats. It is incumbent on the agent to learn quickly.
Every time the agent encounters a resource, it needs to learn the cue events that immediately preceded the resource event. However, it can not learn any such events indiscriminately as some such events may be incidental and not truly causally connected with the presence of a resource. Thus learning must be somewhat gradual. The depth of learning a relation should be based on multiple occurances so as to be considered a reliable cue. The adaptrode learning mechanism is well suited to this need. It provides for fast encoding, in a short-term trace, with longer-term encoding following only after numerous co-encounters of cue followed by resource. This means that the agent will begin behaving by following presumptive cues after only one or two encounters. It will, however, forget cue pretenders that fail to be reinforced over time.
A necessary characteristic of a cue that makes it a reliable predictor of the presence of a resource (aside from being consistent) is that it will provide a stimulus to the agent a short time interval prior to the agent detecting the actual resource itself. That is, the cue must be detectable at a greater distance than the resource in order to be useful. In our experimental setup we use two major stimulus sources to provide for both resourse (and threat) stimulus and cue stimulus. These are light and sound.
MAVRIC can detect brighter-than-background light sources at a long
distance (about six to eight meters). Sounds can be detected at a
shorter range by turning the sound on only when the robot is in the right
range. In figure 2 blue circles represent neutral tones (simulated
odors), green represents the food tone, yellow a light and red represents
a poison source. Colored annular rings around each source represent
the gradient fields of the stimulus emitted by each tower. MAVRIC must
learn to maneuver using these gradients. The figure depicts MAVRIC's
movement through the search space (puple line = history, green line = best
path). Note how the towers can be grouped to provide the necessary
featural information needed for MAVRIC to disambiguate situations.
It takes MAVRIC about three or four such events, if the co-occurrences are consistent, to begin forming a sufficiently deep memory trace of the relationship such that it begins reacting to the cue as if it were the resource (or threat) itself. With additional co-occurrences the memory trace is further strengthened. However, there are some additional details that are needed to explain learning completely.
We will take a closer look at the sequence of events which support learning a cue stimulus for food. When MAVRIC comes into contact (sonar short-range detection) with an object it is mildly repelled by the object (though close contact actually activates the sense of pain) unless, simultaneously it is detecting the food tone. In the latter case, MAVRIC assumes the object is food and halts to 'feed'. Lets assume that prior to encountering the object, MAVRIC detected a light and some neutral tone (odor). Once the food tone (odor) is detected MAVRIC starts encoding the associations between light and the neutral tone in short-term memory traces. Upon halting, and as long as the food tone sounds, MAVRIC will 'assimilate' the food. That is it will increment a register representing the contents of a simulated stomach. The feeding act will cause this value to increase and provide a form of reward feedback to the 'Seek' associator neuron (see MAVRIC Brain Specification). This feedback gates, marginally, the short-term trace into an intermediate-term trace. Over a longer time scale, the accumulation of food in the stomach causes the increase in 'Energy' through a process of simulated digestion. As energy increases, this provides yet further feedback (confirmation) to the 'Seek' neuron gating the eligibility traces from intermediate-form into long-term traces, again marginally. In other words, MAVRIC gets several levels of evaluative feedback, over longer time scales, to control the degree to which the association (eligibility) traces are gated into a long-term form. If not much food is gained by the feeding process (the tone is not left on for very long - representing a poor food source) then not much gating will occur so that the trace will not be very strong. This is the right result since the presumptive cue stimuli might not provide predictive information about 'rich' food sources. The agent does not want to follow cues to poor food sources if other cues (to rich sources) would produce a better result.
Thus MAVRIC requires several conditions to be met to 'learn' the relationships between stimuli and the availability of food (or presence of a threat). First there need to be one or more stimulus events which precede the food stimulus by some small time increment. Second the food intake must be 'adequate' or sufficient to confirm the presumptive cue(s) as predictive. Third, the co-occurrences must be repeated, that is reinforced over time. Only after several such events will the robot begin to treat the cue as a reliable predictor.
Of course quite a bit depends on our assumptions regarding the density and distribution of resource 'patches' in the environment. In actuality, we are modelling natural distributions of food patches which tend to be distributed in what can be described as a fractal manner. That is there may be large-scale aggregations of patches. These might be aggregated into even larger patches. Figure 4 shows two patches, somewhat closely spaced with considerable empty space otherwise.
We will soon be setting these tasks for MAVRIC to tackle. Stay
tuned to this page for updates on results as we run the experiments!.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.