Why Does It Work?
•
The gradient descent
rule treats error as a
surface
•
It tries to find the
lowest point of this
surface
•
It uses calculus to
find the steepest
slope downhill from
its current location on
the space