Data cleaning and normalization¶
Python notebook: https://github.com/daviskregers/data-science-recap/blob/main/24-cleaning-data.ipynb
- The reality is, much of your time as a data scientist will be spent preparing and cleaning your data.
- Outliers
- Missing data
- Malicious data
- Erroneous data
- Irrelevant data
- Inconsistent data
- Formatting
Garbage in, garbage out¶
- Look at your data, examine it.
- Question your results.
- And always do this - not just when you don't get a result that you like.