Skip to content

MLLib introduction

Some capabilities

  • Feature extraction
    • Term frequency / inverse document frequency useful for search
  • Basic statistics
    • Chi-squared test, pearson or spearman correlation, min, max, mean, variance
  • Linear regression, logistic regression
  • Support Vector Machines
  • Naive Bayes Classifier
  • Decision trees
  • K-mean clustering
  • Principal component analysis, singular value decomposition
  • Recommendations using alternating least squares

Special MLLib Data Types

  • Vector (dense or sparse)
  • LabeledPoint
  • Rating