Skip to content

K-fold cross validation to avoid overfittingΒΆ

Python notebook: https://github.com/daviskregers/data-science-recap/blob/main/23-k-fold-cross-validation.ipynb

  • One way to further protect against overfitting is K-fold cross validation
    • Split your data into K randomly assigned segments
    • Reserve one as your test data
    • Train on the combined remaining K-1 segments and measure their performance against the test set
    • Repeat for each segment
    • Take the average of the K r-squared scores
  • Prevents you from overfitting to a single train/test split