Skip to content

Deep Learning Details

Backpropogation

  • How do you train a MLP's weights? How does it learn?
  • Backgropogation or specifically gradient descent using reverse-mode autodiff.
  • For each training step
    • Compute the output error
    • Compute how much each neuron in the previous hidden layer contributed
    • Back-propogate that error in a reverse pass
    • Tweak wights to reduce the error using gradient descent

Activation functions (aka rectifier)

  • Step functions don't work with gradient descent - there is no gradient.
    • Mathematically they have no useful derivative.
  • Alternatives
    • Logistic function
    • Hyperbolic tangent function
    • Exponential linear unit (ELU)
    • ReLU function (Rectified Linear Unit)
  • ReLU is common. Fast to compute and works well.
    • Also, "Leaky ReLU", "Noisy ReLU"
    • ELU can sometimes lead to faster learning though.

Optimization functions

  • There are faster (as in faster learning) optimizers than gradient descent
    • Momentum optimization
      • Introduces momentum term to the descent, so it slows down as things start to flatten and speeds up as the slope is steep.
    • Nesterov Accelerated Gradient
      • A small tweak on momentum optimization - computes momentum based on the gradient slightly ahead of you, not where you are.
    • RMSProp
      • Adaptive learning rate tyo help point toward the minimum
    • Adam
      • Adaptive moment estimation - momentum + RMSProp combined
      • Popular choice today, easy to use

Avoiding Overfitting

  • With thousands of weights to tune, overfitting is a problem
  • Early stopping (when performance starts dropping during training
  • Dropout - ignore say 50% of all neurons randomly at each training step
    • Works surprisingly well
    • Forces your model to spread out it's learning

Tuning your topology

  • Trial & error is one way
    • Evaluate a smaller network with less neurons in the hidden layers
    • Evaluate a larger network with more layers
      • Try reducing the size of each layer as you progress - form a funnel
  • More layers can yield faster learning
  • Or just use more layers and neurons than your need, and don't care because your use early stopping
  • Use "model zoos"