A thorough explanation of Naive Bayes with an example

Naive Bayes. What may seem like a very confusing algorithm is actually one of the simplest algorithms once understood. Part of why it’s so simple to understand and implement is because of the assumptions that it inherently makes. However, that’s not to say that it’s a poor algorithm despite the strong assumptions that it holds — in fact, Naive Bayes is widely used in the data science world and has a lot of real-life applications.

In this article, we’ll look at what Naive Bayes is, how it works with an example to make it easy to understand, the different types of Naive Bayes, the pros and cons, and some real-life applications of it.

In order to understand Naive Bayes and get as much value out of this article, it’s expected that you have a basic understanding of the following concepts:

Conditional probability: a measure of the probability of event A occurring given that another event has occurred. For example, “what is the probability that it will rain given that it is cloudy?” is an example of conditional probability.

Joint Probability: a measure that calculates the likelihood of two or more events occurring at the same time.

Proportionality: refers to the relationship between two quantities that are multiplicatively connected to a constant, or in simpler terms, whether their ratio yields a constant.

Bayes Theorem: according to Wikipedia, Bayes’ Theorem describes the probability of an event (posterior) based on the prior knowledge of conditions that might be related to the event.

Naive Bayes is a machine learning algorithm, but more specifically, it is a classification technique. This means that Naive Bayes is used when the output variable is discrete. The underlying mechanics of the algorithm are driven by the Bayes Theorem, which you’ll see in the next section.

First, I’m going to walk through the theory behind Naive Bayes, and then solidify these concepts with an example to make it easier to understand.

The Naive Bayes Classifier is inspired by Bayes Theorem which states the following equation: