A neural-network mathematical model that, relative to prior such models, places greater emphasis on some of the temporal aspects of real neural physical processes, has been proposed as a basis for massively parallel, distributed algorithms that learn dynamic models of possibly complex external processes by means of learning rules that are local in space and time. The algorithms could be made to perform such functions as recognition and prediction of words in speech and of objects depicted in video images. The approach embodied in this model is said to be “hardware-friendly” in the following sense: The algorithms would be amenable to execution by special-purpose computers implemented as very-large-scale integrated (VLSI) circuits that would operate at relatively high speeds and low power demands.
It is necessary to present a large amount of background information to give meaning to a brief summary of the present neural-network model:
- A dynamic model to be learned by the present neural-network model is of a type denoted an internal model or predictor. In simplest terms, an internal model is a set of equations that predicts future measurements on the basis of past and current ones. Internal models have been used in controlling industrial plants and machines (including robots).
- One of the conclusions drawn from Pavlov’s famous experiments was the observation that reinforcers of learning (basically, rewards and punishments) become progressively less efficient for causing adaptation of behavior as their predictability grows during the course of learning. The difference between the actual occurrence and the prediction of the reinforcer is denoted as the reinforcement prediction error signal. In what is known in the art as the temporal-difference model (TD model) of Pavlovian learning, the reinforcement prediction error signal is used to learn a reinforcement prediction signal. The desired reinforcement prediction signal is defined as a weighted sum of future reinforcement signals wherein reinforcements are progressively discounted with increasing time into the future. The reinforcement prediction error signal progressively decreases during learning as the reinforcement prediction signal becomes more similar to the desired reinforcement prediction signal.
- Algorithms based on the TD model (“TD algorithms”) have been analyzed and shown to be implementations of a variant of dynamic programming. Machine-learning studies have shown that TD algorithms are powerful means of solving reinforcement learning problems that involve delayed reinforcement. Examples of such problems include board games, which involve delayed rewards (winning) or punishments (losing).
- Mid-brain dopamine neurons are so named because they modulate the levels of activity of other neurons by means of the neurotransmitter chemical dopamine. The cell bodies of dopamine neurons are located in the brain stem and their axons project to many brain areas. Activities of dopamine neurons have been found to be strikingly similar to the prediction error signals of the TD model.
- Real neural signals include spikelike pulses, the times of occurrence of which are significant. The term “biological spike coding” denotes, essentially, temporal labeling of such pulses in real neurons or in a neural-network model.