Naive Bayes (RapidMiner Studio Core)

Synopsis

This operator generates a Naive Bayes classification model.

Description

A Naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem (from Bayesian statistics) with strong (naive) independence assumptions. A more descriptive term for the underlying probability model would be 'independent feature model'. In simple terms, a Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class (i.e. attribute) is unrelated to the presence (or absence) of any other feature. For example, a fruit may be considered to be an apple if it is red, round, and about 4 inches in diameter. Even if these features depend on each other or upon the existence of the other features, a Naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple.

The advantage of the Naive Bayes classifier is that it only requires a small amount of training data to estimate the means and variances of the variables necessary for classification. Because independent variables are assumed, only the variances of the variables for each label need to be determined and not the entire covariance matrix.

Input

  • training set (Data Table)

    The input port expects an ExampleSet. It is the output of the Select Attributes operator in our example process. The output of other operators can also be used as input.

Output

  • model (Model)

    The Naive Bayes classification model is delivered from this output port. This classification model can now be applied on unseen data sets for prediction of the label attribute.

  • example set (Data Table)

    The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.

Parameters

  • laplace_correctionThis is an expert parameter. This parameter indicates if Laplace correction should be used to prevent high influence of zero probabilities. There is a simple trick to avoid zero probabilities. We can assume that our training set is so large that adding one to each count that we need would only make a negligible difference in the estimated probabilities, yet would avoid the case of zero probability values. This technique is known as Laplace correction. Range: boolean

Tutorial Processes

Working of Naive Bayes

The Retrieve operator is used to load the 'Golf' data set. The Select Attributes operator is applied on it to select just Outlook and Wind attributes. This is done to simplify the understanding of this Example Process. The Naive Bayes operator is applied on it and the resulting model is applied on the 'Golf-testset' data set. The Same two attributes of the 'Golf-testset' data set were selected before application of the Naive Bayes model. A breakpoint is inserted after the Naive Bayes operator. Run the process and see the distribution table in the Results Workspace. We will use this distribution table to explain how Naive Bayes works. Hit the Run button again to continue with the process. Let us see how the first and last examples of the 'Golf-testset' data set were predicted by Naive Bayes. Note that 9 out of 14 examples of the training set had label = yes, thus the posterior probability of the label = yes is 9/14. Similarly the posterior probability of the label = no is 5/14.

Note that in the testing set, the attributes of the first example are Outlook = sunny and Wind = false. Naive Bayes does calculation for all possible label values and selects the label value that has maximum calculated probability.

Calculation for label = yes

Find product of following: Posterior probability of label = yes (i.e. 9/14) value from distribution table when Outlook = sunny and label = yes (i.e. 0.223) value from distribution table when Wind = false and label = yes (i.e. 0.659) Thus the answer = 9/14*0.223*0.659 = 0.094

Calculation for label = no

Find product of following: posterior probability of label = no (i.e. 5/14) value from distribution table when Outlook = sunny and label = no (i.e. 0.581) value from distribution table when Wind = false and label = no (i.e. 0.397) Thus the answer = 5/14*0.581*0.397= 0.082

As the value for label = yes is the maximum of all possible label values, label is predicted to be yes.

Similarly let us have a look at the last example of the 'Golf-testset' data set. Note that in the testing set, in first example Outlook = rain and Wind = true. Naive Bayes does calculation for all possible label values and selects the label value that has maximum calculated probability.

Calculation for label = yes

Find product of following: posterior probability of label = yes (i.e. 9/14) value from distribution table when Outlook = rain and label = yes (i.e. 0.331) value from distribution table when Wind = true and label = yes (i.e. 0.333) Thus the answer = 9/14*0.331*0.333 = 0.071

Calculation for label = no

Find product of following: posterior probability of label = no (i.e. 5/14) value from distribution table when Outlook = rain and label =no (i.e. 0.392) value from distribution table when Wind = true and label = no (i.e. 0.589) Thus the answer = 5/14*0.392*0.589 = 0.082

As the value for label = no is the maximum of all possible label values, label is predicted to be no.

Now run the process again, but this time uncheck the laplace correction parameter. Now you can see that as laplace correction is not used for avoiding zero probability, there are numerous zeroes in the distribution table.