Foundations Of Cognitive Science

Gradient Descent Learning

An artificial neural network that uses threshold devices as output units (e.g. the perceptron (Rosenblatt, 1962)) is typically trained with the delta rule. This means that the change in a connection weight is defined by the product of the output unit's error and the activity being sent through the other end of the connection. Threshold devices use the discontinuous step function to compute activity. If this type of function is approximated by a continuous function, such as the logistic equation, then a more advance type of learning can be defined. In this more advanced learning, weights are changed in such a way that network error is decreased as rapidly as possible. In other words, the learning rule moves the network down the steepest slope in error space. This is called gradient descent learning. The delta rule can be converted into gradient descent learning by multiplying output unit error by the first derivative of the output unit's activation function (Rumelhart, Hinton & Williams, 1986). Gradient descent learning is the most common approach to training modern connectionist networks that use integration devices or value units as processors (Dawson, 2004).


  1. Dawson, M. R. W. (2004). Minds And Machines : Connectionism And Psychological Modeling. Malden, MA: Blackwell Pub.
  2. Rosenblatt, F. (1962). Principles Of Neurodynamics. Washington: Spartan Books.
  3. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In D. E. Rumelhart & G. E. Hinton (Eds.), Parallel Distributed Processing (Vol. 1, pp. 318-362). Cambridge, MA: MIT Press.
(Added January 2010)