Skip to content

Data cleaning and normalization

Python notebook: https://github.com/daviskregers/data-science-recap/blob/main/24-cleaning-data.ipynb

  • The reality is, much of your time as a data scientist will be spent preparing and cleaning your data.
    • Outliers
    • Missing data
    • Malicious data
    • Erroneous data
    • Irrelevant data
    • Inconsistent data
    • Formatting

Garbage in, garbage out

  • Look at your data, examine it.
  • Question your results.
    • And always do this - not just when you don't get a result that you like.