Skip to content

Ensemble learning

  • Random forests was an example of ensemble learning
  • It just means we use multiple models to try and solve the same problem, and let them vote on the results.

  • Random Forests uses bagging (bootstrap aggregating) to implement ensemble learning.

    • Many models are built by training or randomly-drawn subsets of the data.
  • Boosting is an alternate technique where each subsequent model in the ensemble boosts attributes that address data mis-classified by the previous model
  • A bucket of models trains several different models using training data and picks the one that works best with the test data
  • Stacking runs multiple models at once on the data and combines the results together
    • This is how the netflix prize was won.

Advanced Ensemble learning

  • Bayes Optimal Classifier
    • Theoretically the best - but always impractical
  • Bayesian Parameter Averaging
    • Attemts to make BOC practical - but it's still misunderstood, susceptible to overfitting and often outperformed by the simpler bagging approach
  • Bayesian Model Combination
    • Tries to address all of those problem
    • But in the end, it's about the same as using cross-validation to find the best combination of models