Feature Engineering and Curse of Dimensionality
What is feature engineering?
- Applying your knowledge of the data - and the model you're using - to create better features to train your model with.
- Which features should I use?
- Do I need to transform these features in some way?
- Should I create new features from the existing ones?
- You can't just throw in raw data and expect good results
- This is the art of machine learning; where expertise is applied
- "Applied machine learning is basically feature engineering" - Andrew Ng
The curse of dimensionality
- Too many features can be a problem - leads to sparse data
- Every feature is a new dimension
- Much of feature engineering is selecting the features most relevant to the problem at hand
- This often is where domain knowledge comes into play
- UNsupervised dimensionality reduction techniques can also be employed to distill many features into fewer features