University of Alberta Dictionary of Cognitive Science: Supervised Learning

In cognitive science most networks reported in the literature are not self-organizing, and are not structured via unsupervised learning. Instead, they are networks that are instructed to mediate a desired input-output mapping. This is accomplished via supervised learning. In supervised learning, it is assumed that the network has an external teacher. The network is presented an input pattern, and produces a response to it. The teacher compares the response generated by the network to the desired response, usually by calculating the amount of error associated with each output unit. The teacher then provides the error as feedback to the network. A learning rule uses feedback about error to modify weights in such a way that the next time this pattern is presented to the network, the amount of error that it produces will have decreased.

A variety of learning rules, including the delta rule (Rosenblatt, 1958, 1962; Stone, 1986; Widrow, 1962; Widrow & Hoff, 1960) and the generalized delta rule (Rumelhart, Hinton, & Williams, 1986), are supervised learning rules that work by correcting network errors. This kind of learning involves repeated presentation of a number of input-output pattern pairs, called a training set. Ideally, with enough presentations of a training set, the amount of error produced to each member of the training set will be negligible, and it can be said that the network has learned the desired input-output mapping. Because these techniques require many presentations of a set of patterns for learning to be completed, they have sometimes been criticized as being examples of slow learning (Carpenter, 1989).

References:

Carpenter, G. A. (1989). Neural network models for pattern recognition and associative memory. Neural networks, 2, 243-257.
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386-408.
Rosenblatt, F. (1962). Principles Of Neurodynamics. Washington: Spartan Books.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533-536.
Stone, G. O. (1986). An analysis of the delta rule and the learning of statistical associations. In D. E. Rumelhart & J. McClelland (Eds.), Parallel Distributed Processing (Vol. 1, pp. 444-459). Cambridge, MA: MIT Press.
Widrow, B. (1962). Generalization and information storage in networks of ADALINE "neurons". In M. C. Yovits, G. T. Jacobi & G. D. Goldsteing (Eds.), Self-Organizing Systems 1962 (pp. 435-461). Washington, DC: Spartan Books.
Widrow, B., & Hoff, M. E. (1960). Adaptive switching circuits. Institute of Radio Enginners, Wester Electronic Show and Convention, Convention Record, Part 4, 96-104.

(Added November 2010)