•**Assume that the ****perceptron uses a ****sigmoid activation ****function
**

•**Calculus can be used to ****determine a gradient ****descent rule that moves ****the network downhill in ****error space as fast as ****possible
**

•**The calculus is only ****possible because the ****sigmoid is a continuous ****approximation of the ****threshold function**