Skip to content

Deep Learning Details¶

Backpropogation¶

How do you train a MLP's weights? How does it learn?
Backgropogation or specifically gradient descent using reverse-mode autodiff.
For each training step
- Compute the output error
- Compute how much each neuron in the previous hidden layer contributed
- Back-propogate that error in a reverse pass
- Tweak wights to reduce the error using gradient descent

Activation functions (aka rectifier)¶

Step functions don't work with gradient descent - there is no gradient.
- Mathematically they have no useful derivative.
Alternatives
- Logistic function
- Hyperbolic tangent function
- Exponential linear unit (ELU)
- ReLU function (Rectified Linear Unit)
ReLU is common. Fast to compute and works well.
- Also, "Leaky ReLU", "Noisy ReLU"
- ELU can sometimes lead to faster learning though.

Optimization functions¶

There are faster (as in faster learning) optimizers than gradient descent
- Momentum optimization
  - Introduces momentum term to the descent, so it slows down as things start to flatten and speeds up as the slope is steep.
- Nesterov Accelerated Gradient
  - A small tweak on momentum optimization - computes momentum based on the gradient slightly ahead of you, not where you are.
- RMSProp
  - Adaptive learning rate tyo help point toward the minimum
- Adam
  - Adaptive moment estimation - momentum + RMSProp combined
  - Popular choice today, easy to use

Avoiding Overfitting¶

With thousands of weights to tune, overfitting is a problem
Early stopping (when performance starts dropping during training
Dropout - ignore say 50% of all neurons randomly at each training step
- Works surprisingly well
- Forces your model to spread out it's learning

Tuning your topology¶

Trial & error is one way
- Evaluate a smaller network with less neurons in the hidden layers
- Evaluate a larger network with more layers
  - Try reducing the size of each layer as you progress - form a funnel
More layers can yield faster learning
Or just use more layers and neurons than your need, and don't care because your use early stopping
Use "model zoos"