A neural-network mathematical model that, relative to prior such models, places greater emphasis on some of the temporal aspects of real neural physical processes, has been proposed as a basis for massively parallel, distributed algorithms that learn dynamic models of possibly complex external processes by means of learning rules that are local in space and time. The algorithms could be made to perform such functions as recognition and prediction of words in speech and of objects depicted in video images. The approach embodied in this model is said to be “hardware-friendly” in the following sense: The algorithms would be amenable to execution by special-purpose computers implemented as very-large-scale integrated (VLSI) circuits that would operate at relatively high speeds and low power demands.
It is necessary to present a large amount of background information to give meaning to a brief summary of the present neural-network model:
- A dynamic model to be learned by the present neural-network model is of a type denoted an internal model or predictor. In simplest terms, an internal model is a set of equations that predicts future measurements on the basis of past and current ones. Internal models have been used in controlling industrial plants and machines (including robots).
- One of the conclusions drawn from Pavlov’s famous experiments was the observation that reinforcers of learning (basically, rewards and punishments) become progressively less efficient for causing adaptation of behavior as their predictability grows during the course of learning. The difference between the actual occurrence and the prediction of the reinforcer is denoted as the reinforcement prediction error signal. In what is known in the art as the temporal-difference model (TD model) of Pavlovian learning, the reinforcement prediction error signal is used to learn a reinforcement prediction signal. The desired reinforcement prediction signal is defined as a weighted sum of future reinforcement signals wherein reinforcements are progressively discounted with increasing time into the future. The reinforcement prediction error signal progressively decreases during learning as the reinforcement prediction signal becomes more similar to the desired reinforcement prediction signal.
- Algorithms based on the TD model (“TD algorithms”) have been analyzed and shown to be implementations of a variant of dynamic programming. Machine-learning studies have shown that TD algorithms are powerful means of solving reinforcement learning problems that involve delayed reinforcement. Examples of such problems include board games, which involve delayed rewards (winning) or punishments (losing).
- Mid-brain dopamine neurons are so named because they modulate the levels of activity of other neurons by means of the neurotransmitter chemical dopamine. The cell bodies of dopamine neurons are located in the brain stem and their axons project to many brain areas. Activities of dopamine neurons have been found to be strikingly similar to the prediction error signals of the TD model.
- Real neural signals include spikelike pulses, the times of occurrence of which are significant. The term “biological spike coding” denotes, essentially, temporal labeling of such pulses in real neurons or in a neural-network model.
This concludes the background information.
The present neural-network model incorporates biological spike coding along with some basic principles of the learning by synapses in the cortex of the human brain. According to the learning rule of the model, synaptic weights are adapted when pre- and postsynaptic spikes occur within short time windows. In simplified terms, for a given synapse and time window, the synaptic strength is increased in the long term if the presynaptic spike precedes the postsynaptic spike or is decreased in the long term if the presynaptic spike follows the postsynaptic spike. This learning rule has been shown to minimize prediction errors, indicating that the neural network learns an optimal dynamic model of an external process.
This work was done by Tuan Duong, Vu Duong, and Roland Suri of Caltech for NASA’s Jet Propulsion Laboratory.
In accordance with Public Law 96-517, the contractor has elected to retain title to this invention. Inquiries concerning rights for its commercial use should be addressed to:
Innovative Technology Assets Management
Mail Stop 202-233
4800 Oak Grove Drive
Pasadena, CA 91109-8099
Refer to NPO-41691, volume and number of this NASA Tech Briefs issue, and the page number.