One popular method for training ANNs is the generalized delta rule (Rumelhart, Hinton, & Williams, 1986). With this learning rule, patterns are repeatedly presented to the network, and the network's actual responses are compared to the responses that are desired for these patterns. This comparison involves computing an error term, which can be used to modify the pattern of connectivity in the network in such a way that the network's responses become more and more correct.
The mathematics underlying the generalized delta rule requires that the activation function, used by processing units to compute their internal activity from their net input, be monotonic. This means that when the total signal going into a processor is increased, then the processor's internal activity should never decrease. As a result, most ANNs are based upon a sigmoid-shaped activation function such as the logistic. Adopting Ballard's (1986) terminology, we call these standard units integration devices.
However, there are reasons to believe that neurons would be better modeled with a nonmonotonic activation function (e.g., Ballard, 1986). At the cellular level, many neurons behave as if they are tuned to respond to a narrow range of signals (e.g., a narrow range of light wavelengths, a narrow range of spatial frequencies). If the signal is either lower than or higher than this range, then the cell does not respond. The kind of activation function required to model this type of behaviour is a bell-shaped function, such as a Gaussian. Following Ballard's terminology again, we call a processor that uses such an activation function a value unit. Much of the current work at the BCP began with our attempts to train networks of value units.
If value units are trained with the standard version of the generalized delta rule, then practical learning does not occur. Instead, the network learns to turn its output units off to every pattern. This is because the mathematics of the generalized delta rule requires monotonic activation functions which are not present in value units. To overcome this problem, we derived an elaborated version of the generalized delta rule (Dawson & Schopflocher, 1992a). For this new learning rule, network error is not only defined in terms of the difference between desired and actual network responses, but also in terms of heuristic information that is used to ensure that the network attempts to turn its output units "on" for some patterns.
The original motivation for developing a learning rule for the value unit architecture was the desire to increase the biological relevance of ANNs (e.g., Dawson, Shamanski & Medler, 1993). However, after this learning rule was derived, we soon discovered that value units had many algorithmic advantages over traditional ANNs built with integration devices and trained with the standard version of the generalized delta rule.
First, value units learn to solve complicated (i.e., linearly nonseparable) problems much faster than do standard networks (Dawson & Schopflocher, 1992a; Dawson, Schopflocher, Kidd & Shamanski, 1992; Medler & Dawson, 1994; Shamanski, Dawson & Berkeley, 1994). This is because the new definition of error that was created in the value unit learning rule incorporates a heuristic component that is not used in the standard learning rule.
Second, value units appear to better generalize what they have learned on a training set to new patterns, in comparison to standard networks (e.g., Shamanski, Dawson & Berkeley, 1994).
Third, value unit networks appear to be better than integration device networks at scaling their performance up to larger versions of the same problem (e.g., Medler & Dawson, 1994).
Fourth,the value unit learning rule permits the training of hybrid networks, in which some processors have a nonmonotonic activation function, while other processors have a monotonic activation function (Dawson & Schopflocher, 1992; Dawson, Schopflocher, Kidd & Shamanski, 1992) The ability to build hybrid networks is attractive, because there is a growing awareness among neuroscientists that brain structure is highly heterogeneous (e.g., Getting, 1989).
Fifth, value unit networks have an interesting property that makes them much easier to interpret (i.e., much easier to look inside and determine how they are actually working) than do other kinds of connectionist systems.