Deep Learning Pre-Requisites¶

Gradient descent requires knowledge of the gradient from your cost function (MSE)
Mathematically we need the first partial derivative of all the inputs
- This is hard and inefficient if you just throw calculus at the problem
Reverse-mode autodiff can be used
- Optimized for many inputs + few outputs (lika neuron)
- Computes all partial derivatives in # of outputs + 1 graph traversals
- Still fundamentally a calculus trick - it's complicated but it works
- This is what tensorflow uses

Softmax¶

Gradient descent is an algorithm for minimizing error over multiple steps
Autodiff is a calculus trick for finding the gradients in gradient descent
Softmax is a function for choosing the most probable classification given several input types