Natural environments are not closed worlds. The immediate (read accessible) environment of the agent is itself embedded in, and interacts with, a larger environment. And, in turn that environment is embedded in a still larger one. While this regress is not necessarily infinite in any absolute sense (i.e., the Universe may be closed), so far as the agent is concerned it is effectively so due to the limits of accessibility in time. Environmental interactions that took place in a prior time period on the periphery of the agent's immediate environment can alter relationships that the agent has already learned. The agent's knowledge is thus rendered less useful and certainly suboptimal.
Adding to the complexity of real worlds, the time scales of these indeterminate changes are themselves intedeterminate. Anything from catastrophe to subtle, long-term changes can ensue depending on the dynamics of the interaction and the spatial scale involved. An earthquake. resulting from eons of pressure buildup in the tectonic plates, can alter the landscape in an instant. Changes in solar radiation due to sun spot activity will cause colder or warmer seasons over many years. Animals, our best examples of autonomous agents in real world environments, need to adapt to a wide range of changing conditions to survive in the real world.
As pointed out, traditionally machine learning has been pursued under the closed world assumption. In large part this pursuit was motivated by a simple expediency. Closed worlds are subject to tractable mathematical analysis. One can offer proofs that a given algorithm produces a claimed result. The systems are aimed at stationary targets. The problem has been that when these same systems are aimed at different targets from the real world they fail to produce the promised results.
One problem that faces many researchers who have attempted to address this issue is the destructive interference of new information with respect to old knowledge. In effect, continuous learning causes a "washing out" of old knowledge. For example suppose a relationship between A and B has been learned such that the occurrence of A generates a response to B. If at some later time should this relationship be altered in the extant world, the learner would have to acquire whatever new relationship might have developed, say between C and B. In order to do so, many learning systems, particularly symbolic-based representation systems, either need to forget the older association or archive the prior association indexed by some time stamp. In the former case, the new knowledge requires the obliteration of the old. In the latter case, the system may be quickly overwhelmed by the increase in size of the knowledge base. Additionally, recall, say for reasoning purposes, is burdened by extra processing overhead associated with temporal indexing.
The adaptrode overcomes these problems by incorporating a multi-stage, differential memory trace mechanism that allows the simultaneous encoding of associations that separate in time scale. It records not only short-term memory traces but intermediate and long-term traces as well. These traces are cascaded and precedence ordered, from short-term to long-term. Coupled with the adaptrode's normal decay method, this means that traces that are not reinforced over time will fade, rapidly in the short-term trace, more slowly in intermediate traces and very slowly in long-term traces. Thus, the adaptrode can maintain a very long memory trace of an association that once was the rule but has not been sustained. This trace is a faint shadow compared to other short-term traces that might compete for processing resources. From the standpoint of lifelong learning, this makes it possible for the adaptrodes in a network to retain old knowledge without interfering with new knowledge.
This capacity is extremely important. It is not infrequent that associations which have been true for a long time, and then due to short-term changes in the environment, are no longer true over the duration of the short-term changes. Subsequently, the longer-term rule will once again become true. That is, the environment may undergo short-term or transitory changes that are not, technically speaking, noise. Such changes create a need to learn new associations in order to survive. On the other hand, the factors which wrought such a change could eventually revert to the prior condition. The older relationships would again become the norm and the older association knowledge would become operative again. For systems that suffer from destructive interference, the learner would have to start learning anew each time these sorts of transitory changes occurred.
A brain is not a tabula rasa. Though malleable, it is not to be molded in any old shape. It is organized, genetically, to be sensitive to specific kinds of patterns at specific times during the development of the juvenile. Learning in animals, and, according to the latest findings of neuroscience, for humans as well, is guided largely by the meaning of the patterns encountered. Such meaning is rooted (or grounded) in physiological needs, the homeostatic milieu of the body. It is fixed by evolution and underlies a surprising amount of rational thinking [see "Descartes' Error" by Antonio Damasio]. Thus not just anything is learned. What is learned is linked, through numerous levels of association to be sure, to what is important to the biology of the learner. This is called semantic-driven learning. Evolutionarilly-speaking it has been a blindingly obvious success. It would be a worthy model for autonomous agent learning.
A network of adaptrode-based neuron-like processing elements can be constructed which learn associations based on semantics. That is, the network can extract meaningful associations from the seeming chaos of patterns impinging on the learning agent. This is accomplished by gating the influence of the short-term memory trace on the intermediate-term trace with a correlation term, derived from a second signal source. The correlation is not simple however. The second signal must arrive after some small time lapse since the onset of the primary signal. If the second signal arrives before the primary or even simultaneously, no intermediate-term encoding ensues. The secondary signal comes from a special input to the neuron-like processing element. This signal conveys meaning to the learner. It signals some factor in the environment which is deemed, a priori, requisite of a response. In animals such a signal constitutes a stimulus-response circuit, a hard-wired, genetically-determined condition-action pair. The adaptrode's primary signal, however, comes from what might be called "free sense" transducers. Signals from such transducers (note that this could be another neuron) do not convey any necessary meaning. They simply inform the neuron about states of the relevant environment at any given instant.
The adaptrode provides what Damasio calls a convergence point for these signals. One, an arbitrary world-state variable, another a world-state variable with direct consequences for the agent. The one arriving early, the other lagging slightly. The result of this temporal offset, if it happens reliably over longer time scales, is that the primary signal takes on the same significance as the secondary. In other words, the primary, "free sense" signal comes to represent the onset of the secondary meaningful one. It effectively becomes a predictor of the meaningful signal. An important survival advantage can easily be seen in this scheme. Prediction of an impending consequential event gives the learner a jump start to respond. The learner becomes proactive with respect to the consequence. Early action can tend to reduce costs of merely reacting to stimuli.
In other words bookmark objects age. How can these objects be managed in such a way that old, unused or unimportant items can be readily identified and made candidates for disposal? One simple approach is to attach an adaptrode to each URL. Each time the page is viewed, it results in an excitation of the primary input to the adaptrode. Frequent viewing would result in non-zero short-term trace values. Periodic surveys of these traces on all adaptrodes would quickly reveal zero valued traces. But the adaptrode goes beyond this simplistic scheme. Suppose that the viewing of the page results in a click-through to another page or resource (e.g., ftp's file). This would be a clear indication that the page was meaningful in some sense. Thus a secondary signal to the adaptrode would gate short-term memory into an intermediate-term trace. The adaptrode would remember the bookmarked page as a predictor that something useful might be obtained by the user.
A survey of non-zero valued intermediate-term traces would provide a list of candidate pages for a further stage of memory encoding. A user could periodically be asked to explicitly rate the page bookmarked for its value. The rating would be the basis for a final gating of the intermediate-term trace into long-term memory. Effectively, an adaptrode in which the long-term trace is non-zero is effectively permanent by virtue of frequency of use, resulting in the magnitudes of short-term and intermediate-term traces being high, and the long decay time for long-term traces.
Memory traces do fade however. Over time, even the long-term trace will decay toward zero. Unless a page is frequented over the time scale into which it is encoded, its adaptrode will eventually have a zero valued short-term trace. It will become a candidate for removal from the list. The user could, of course, be given the option to save the page URL anyway, or archive it perhaps. But the manager agent will have done its duty in keeping track of what the user actually does with the resources she has squirreled away.
And that is the key to agent usefulness. Embodied in the memory traces of the adaptrodes attached to those many bookmarked URLs is the pattern of actual user habit/thinking. The agent has learned a great deal about the user's needs by observing the user's behavior. It can make informed suggestions to the user that would be actually useful rather than intrusive.
Bookmarks are just one example of persistent objects that will depend on user's needs. There are obviously many more examples of such objects that do now, or will soon populate cyberspace. Adaptrodes and agents based on this learning mechanism represent a way to add true intelligence - that is semantically based reasoning - to the management of such objects.
There is a long tradition in animal learning theory of modelling the memory trace of a stimulus using the exponential weighted moving avearge (EWMA) method. This method produces a close approximation of data obtained from animal experiments (so-called signature data) for short time scales, but fails to model longer-term memory phenomena very well. The adaptrode evolved from attempts to extend the EWMA method to account for these longer-term phenomena.
Mathematically, EWMA can be stated:
Eq.
1
Where s(t) is the signal strength at time t; w(t) is the memory trace
variable and alpha is a constant between 0 and 1.
Equation 1 can be rearranged:
Eq.
2
The EWMA model of a memory trace suffers from the fact that it falls
off as fast as it rises. It works well to represent the dynamics of a memory
trace over a short span of time, but cannot provide a longer-term trace.
The adaptrode model introduces two "fixes" which improve the retention
of a memory trace while maintaining the leaky integrator characteristics
of the EWMA. The basic adaptrode equation is:
Eq.
3
The first of two innovations here are the use of a distinct decay factor, differnt from alpha. This factor, delta, is generally much smaller than alpha thus providing for an extended, though still exponential decay of the trace variable. The second inno vation is the use of a shunting factor to bound the growth of w(t). Figure 1 shows a comparison between the adaptrode trace and an EWMA trace. A unit pulse signal is received with each time tick from t = 5 to t = 9. After that the input is held at zero until t = 24. A single pulse is received at that time tick.

An example is the impinging signal on a particular hair cell in the ear, tuned to respond to a relatively narrow bandwidth. Owing to the chaotic distribution in space and time of sounds in which this band is a component, the hair cell will be activated in a sporadic and episodic fashion. The hair cell is the source of auditory neural signals that are the basis of sound learning and recognition in the brain. Neurons must be able to encode memory traces of these signals in such a way as to accomodate the episodic/sporadic nature of real-world events.
A third innovation of the adaptrode model, based on the biological properties
of real synapses (see below) is the use of multiple stages to encode traces
over increasingly extended time scales. The basic adaptrode equation is
used recursively but with adjustments to the alpha and delta parameters.
Equation 4 introduces a new index, k, which runs from 0 to some desired
number of time scales or domains less 1 (L).
Eq.
4
Where:
Signals s_k are the secondary inputs to the adaptrode, converging on the primary signal through the cascade of trace variables. This is the correlation factor mentioned above. Its role will be better explained below.
Here I want to focus on the dynamics of the memory traces themselves as the evolve over time with various primary input signals. The secondary signals will be set to 1.0 and fixed. The basic model generated by Equation 4 is shown in Figure 2. Here a three level adaptrode is stimulated over 5 time units starting at time tick 1 (top trace rising for five ticks).

To see the relevance of this effect we need to look at how the adaptrode responds to multiple input episodes over time. The actual output of an adaptrode filter, and hence its effect on the system in which it is embedded, is related to the value of w0 in a straightforward way. Let r(t+1) be the response of an adaptrode at time step t. Then:
Eq. 5
The response of an adaptrode is thus either the dominated by the value of w0, if the input is active (in the current examples we are dealing with binary inputs of either one or zero but the arguments here hold for continuous values from one to zero as well), or by an exponentially decaying value of r itself. The trace of r (not shown)_ effectively follows that of w0, but falls off more steepley - it is not bounded from below by w 1.
With the understanding that the response of the adaptrode is dependent on that of w0, we can now look at the effects of multiple input episodes on the dynamics of that response. In Figure 3 is shown the memory traces of a three-level adaptrode similar to that in Figure 2, but with two input episodes separated by some interval of time. The peak response of the adaptrode is approximated by that of w0. As can be seen in the figure, the second response is at a slightly higher value than that of the first episode. It is clear in the figure that this is due to the effects of the longer term memory traces of w1 and w2. It is this dynamic that accounts for the learning taking place in the adaptrode. The unit is learning to respond more strongly as a function of its input history. One can see that not only is the peak higher in the second response, but that the initial response in the second episode is higher than that in the first.

Most learning situations, to be discussed below, are associative in nature. That is the system encodes a correlation-based association between two or more signals. However, there are important non-associative adaptive responses that play an important role in circuit dynamics. In practice the gating signals, s1, s2, etc. are not clamped at 1.0, but rather are modulated based on prior stage activity. For example, s1 may be switched on (1.0) while the adaptrode response (Eq. 5) is above some threshold value. Similarly, s2 can be switched on while w0 is above w1 by some small epsilon.
Briefly, causal relation encoding is the basis for perception of order in the world. It is fundamentally important for agents to be able to predict the occurance of semantically important events based on causal relations (Granger, 1969) with otherwise non-meaningful events as discussed above under the topic of Semantic-driven Associative Learning.
The general model is summarized in Figure 4 and described here.

In the figure, variables of interest are contained in square brackets. Thin arrows indicate signals (flows of ions or molecules) which increase a value or slow its decay. Heavy black arrows represent slow decrement processes such as the removal of calcium ions from the cytosol. The dashed arrows represent feedback loops that act principally by down-modulating the slow decrement processes (slow them down even more). Associated with all of the arrows are rate constants (not shown in this figure). Both rate constants of increase (thin arrows) and constants of decay or active removal (heavy arrows) are associated with these processes.
In the following description it is important to recognize that this is my interpretation of the model as expounded by Alkon. I have attempted to extract from his model those factors and their relationships which seem to me to be the important essence of an adaptive process - namely the multiple time scales over which the processes operate and the feedback loops. Any distortion of the biological model as Alkon presents it are entirely due to my interpretation. The intereseted reader is encouraged to take a look at Alkon's work directly for complete clarification.
Following the time-course of events, a series of APs arrive at the synapse labelled "CS" triggering the flux of ions and raising the EPSP. Calcium ions accumulate in the interior of the compartment. At the same time the elevated EPSP opens a "gate" (exact mechanism unknown but it is thought to involve the concentration of calcium prior to the rise in the EPSP) iff there is no previous influx of calcium from another compartment (show above in the figure as the "unconditionable synapse"). Calcium ions from the influx at the conditionable synapse start to accumulate initiating a cascade of biophysical processes which impact the outflux of potassium ions. In the short run, the concentration of calcium ions itself has a down-modulating effect on the potassium outflux. As potassium is retained, this makes the membrane potential more positive with respect to the exterior than it would have been with just the influx of sodium and calcium ions.
Over a somewhat longer time scale, the continued or increased concentration of calcium ions (from the unconditionable synaptic compartment) up-modulates the phosphorylation of potassium ion channel proteins (thus further inhibiting the outflux of potassium) by several processes (e.g., "KaM Kinase II" and others in the diagram). In the event that the second source of calcium is abscent, due either to the gate being closed or the "US" signal not arriving, the compartment concentration of calcium ions is reduced by active and passive removal mechanism so that the longer-term effects on the potassium outflux are minimized. In this case the system returns to a restored state and the memory trace represented by the calcium ion concentration decays to effectively zero.
If, on the other hand, the concentration of calcium ions is longer lived due to the secondary influx, even longer-term processes are set in motion. For example, the mollecule protein kinase C (PKC) moves from the cytosol into the cell membrane where its effectiveness as a phosphoryllator increases. This translocation of a mollecule is presumably a slower processes - slow to build and slow to decay.
All of these events, if they occur, contribute to the continued elevation of the EPSP above its normal resting level. A burst of APs give rise to a transient increase in the EPSP known as post-tetanic potentiation (PSP). If the longer term processes are initiated, the effect is not transitory in the sense that the elevated EPSP may last for several minutes to hours. This is an intermediate-term potentiation (ITP). Finally, if the calcium ion concentration is sustained long enough to effect the translocation of PKC, the elevated EPSP may last for days. This model corresponds with the phenomenon of long-term potentiation (LTP) [c.f. Paul Kelly's Mechanisms Regulating Synaptic Plasticity in Brain].
The model also involves even longer term processes such as the increase in protein synthesis (perhaps ion channel components) that takes place in the cell body (shown as a second compartment under the synaptic compartment and separated by a dashed line). Protein synthesis and transport operates over much longer time scales (as compared with the realtime events associated with the arrival of APs). Additionally, protein denaturing and removal is a relatively slow process so that the accumulation of protein factors that effect the efficacy of the synapse must be long-term.
Finally, Alkon raises the possiblity for really long-term processes which involve the activation of some genes due to second messenger systems activated by the long term location of PKC in the membrane. Such activation involves the integration of presumably different proteins or other structural components which can physically alter the morphology of dendritic compartments. Addmittedly this area is speculative, but some evidence for it has been reported.
Clearly the role of multi-time scale processes in effecting the responsiveness of synapses is an important element in memory traces that extend in time.
The adaptrode is a computational model of principles seen in the biological model. It is not an exact homological mapping from biology to computation. There are several dynamical models of integrate-and-fire neurons involving considerable details of ion channels that are meant to be close representations of real neurons. The adaptrode is not that kind of model. What I have sought to capture was the computational essence of adpativity and apply it to the case of synaptic processing. None the less some homology is in evidence.
For example, the response of the adaptrode (Eq. 5) is roughly equivalent to the EPSP (roughly because it does not itself stay elevated after fall-off in input signal). The concentration of calcium ions is approximated by the level-0 weight, w0. It can also be seen that the other biophysical processes that contribute to extending the reduced outflux of potassium ions are represented in higher level weights. Each weight in the vector of a single adaptrode has both increase (alpha) and decrement (delta) constants corresponding to the kinetic constants associated with the biophysical processes.