•Assume that the perceptron uses a sigmoid activation function
•Calculus can be used to determine a gradient descent rule that moves the network downhill in error space as fast as possible
•The calculus is only possible because the sigmoid is a continuous approximation of the threshold function