Skip to content

Tuning Neural Networks: Learning Rate and Batch Size hyperparameters¶

Learing Rate¶

Neural networks are trained by gradient descent (or similar means)
We start at some random point and sample different solutions (weights) seeking to minimize some cost function, over many epochs
How far apart these samples are is the learning rate.

Effect of learning rate¶

Too high a learning rate means you might overshoot the optimal solution
Too small learning rate will take too long to find the optimal solution
Learning rate is an example of a hyperparameter

Batch Size¶

How many training samples are used within each epoch
Somewhat counter intuitively:
- Smaller batch sizes can work their way our of "local minima" more easily
- Batch sizes that are too large can end up getting stuck in the wrong solution
- Random shuffling at each epoch can make this look like very inconsistent results from run to run

To Recap¶

Small batch sizes tend to not get stuck in local minima
Large batch sizes can converge on the wrong solution at random
Large learning rates can overshoot the correct solution
Small learning rates increase training time.