MLLib introduction
Some capabilities
- Feature extraction
- Term frequency / inverse document frequency useful for search
- Basic statistics
- Chi-squared test, pearson or spearman correlation, min, max, mean, variance
- Linear regression, logistic regression
- Support Vector Machines
- Naive Bayes Classifier
- Decision trees
- K-mean clustering
- Principal component analysis, singular value decomposition
- Recommendations using alternating least squares
Special MLLib Data Types
- Vector (dense or sparse)
- LabeledPoint
- Rating