Naive Bayes Algorithm

3 min readMay 1, 2021

Hello folks! In this article, we will dive deep into the Bayes algorithm which is a supervised learning algorithm based on applying Bayes’ theorem with the “naive” assumption of conditional independence between each pair of features given the value of the class variable.

Why Naive Bayes??

It is a very simple algorithm for classification problems compared to other classification algorithms.
It is also a robust algorithm which implies that it is faster to predict labels using it compared to other classification algorithms. It’s performance is good on both training and test data set.

To understand the algorithm, we first need to take a look at the basics, so let us start with probability.

Probability is defined as how likely an event is to occur.

P=Number of observations in favour of the event / Total number of observations

Assumptions:

The fundamental Naive Bayes assumption is that each feature makes an independent and equal contribution to the outcome.In other way, we can say there should not be any correlation between features and that each feature has equal importance for the formation of a classification model.

Bayes Theorem:

It describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For example, if I want to know whether it rained on particular day or not? It checks the possibilities that it would rain today, based on previous data available. It doesnot say why it rained today, but helpful in identifying whether it will rain today or not?!Likewise, it can be used in various situations like spam filtering, sentiment analysis and so on.

Where, A and B are events, P(A | B) the likelihood of event A occurring given that B is true. P(B | A) the likelihood of event B occurring given that A true. P(A), P(B): The independent probabilities of A and B

Types of Naive Bayes classifier:

There are three Navie Bayes classifers, let us see all of them in detail.

1. Gaussian Naive Bayes

In Gaussian Naive Bayes, the data follows a normal distribution. When plotted, it gives a bell shaped curve which is symmetric about the mean of the feature values as shown below

The likelihood of the features is assumed to be Gaussian, conditional probability is given by:

2. Multinomial Naive Bayes:

It implements the naive Bayes algorithm for multinomially distributed data, and is one of the two classic naive Bayes variants used in text classification.

The distribution is parametrized by vectors θy=(θy1,…,θyn) for each class y, where n is the number of features (in text classification, the size of the vocabulary) and θyi is the probability P(xi∣y) of feature i appearing in a sample belonging to class y.

The parameters θy is estimated by a smoothed version of maximum likelihood, i.e. relative frequency counting:

3. Bernoulli Naive Bayes:

In the multivariate Bernoulli event model, features are independent booleans (binary variables) describing inputs. Like the multinomial model, this model is popular for document classification tasks, where binary term occurrence(i.e. a word occurs in a document or not) features are used rather than term frequencies(i.e. frequency of a word in the document).

The decision rule for Bernoulli naive Bayes is based on

You can access the implementation of this algorithm here using python!

Advantage: This algorithm can give better results on small datasets compared to other algorithms. Hope you guys got an idea of Naive Bayes and how it works, Happy Learning!