Trains a Naive Bayes model given an RDD of (label, features)
pairs.
Trains a Naive Bayes model given an RDD of (label, features)
pairs.
The model type can be set to either Multinomial NB (http://tinyurl.com/lsdw6p) or Bernoulli NB (http://tinyurl.com/p7c96j6). The Multinomial NB can handle discrete count data and can be called by setting the model type to "multinomial". For example, it can be used with word counts or TF_IDF vectors of documents. The Bernoulli model fits presence or absence (0-1) counts. By making every vector a 0-1 vector and setting the model type to "bernoulli", the fits and predicts as Bernoulli NB.
RDD of (label, array of features)
pairs. Every vector should be a frequency
vector or a count vector.
The smoothing parameter
The type of NB model to fit from the enumeration NaiveBayesModels, can be multinomial or bernoulli
Trains a Naive Bayes model given an RDD of (label, features)
pairs.
Trains a Naive Bayes model given an RDD of (label, features)
pairs.
This is the default Multinomial NB (http://tinyurl.com/lsdw6p) which can handle all kinds of discrete data. For example, by converting documents into TF-IDF vectors, it can be used for document classification.
RDD of (label, array of features)
pairs. Every vector should be a frequency
vector or a count vector.
The smoothing parameter
Trains a Naive Bayes model given an RDD of (label, features)
pairs.
Trains a Naive Bayes model given an RDD of (label, features)
pairs.
This is the default Multinomial NB (http://tinyurl.com/lsdw6p) which can handle all kinds of discrete data. For example, by converting documents into TF-IDF vectors, it can be used for document classification.
This version of the method uses a default smoothing parameter of 1.0.
RDD of (label, array of features)
pairs. Every vector should be a frequency
vector or a count vector.
Top-level methods for calling naive Bayes.