## Naive Bayes starts with just counting and then moving to probability

Naive Bayes classification is one of the most simple and popular algorithms in data mining or machine learning (Listed in the top 10 popular algorithms by CRC Press Reference [1]). The basic idea of the Naive Bayes classification is very simple.

(In case you think video format is more suitable for you, you can jump here you can also go to the notebook.)

Let’s say, we have books of two categories. One category is Sports and the other is Machine Learning. I count the frequency of the words of “Match” (Attribute 1) and Count of the word “Algorithm” (Attribute 2). Let’s assume, I have a total of 6 books from each of these two categories and the count of words across the six books looks like the below figure.

We see that clearly that the word ‘algorithm’ appears more in Machine Learning books and the word ‘match’ appears more in Sports. Powered with this knowledge, Let’s say if I have a book whose category is unknown. I know Attribute 1 has a value 2 and Attribute 2 has a value 10, we can say the book belongs to Sports Category.

Basically we want to find out which category is more likely, given attribute 1 and attribute 2 values.

This count-based approach works fine for a small number of categories and a small number of words. The same intuition is followed more elegantly using **conditional probability.**

Conditional Probability is again best understood with an example

Let’s assume

Event A: The face value is odd | Event B: The face value is less than 4

P(A) = 3/6 (Favourable cases 1,3,5 Total Cases 1,2,3,4,5,6) similarly P(B) is also 3/6 (Favourable cases 1,2,3 Total Cases 1,2,3,4,5,6). An example of conditional probability is what is the probability of getting an odd number (A)given the number is less than 4(B). For finding this first we find the intersection of events A and B and then we divide by the number of cases in case B. More formally this is given by the equation

P(A|B) is the conditional probability and is read as the probability of A Given B. This equation forms the central tenet. Let’s now go back again to our book category problem, we want to find the category of the book more formally.

Let’s use the following notation Book=ML is Event A, book=Sports is Event B, and “Attribute 1 = 2 and Attribute 2 = 10” is Event C. The event C is a joint event and we will come to this in a short while.

Hence the problem becomes like this we calculate P(A|C) and P(B|C). Let’s say the first one has a value 0.01 and the second one 0.05. Then our conclusion will be the book belongs to the second class. This is a Bayesian Classifier, **naive Bayes assumes the attributes are independent. Hence:**

P(Attribute 1 = 2 and Attribute 2 = 10) = P(Attribute 1 = 2) * P(Attribute = 10). Let’s call these conditions as x1 and x2 respectively.

Hence, using the likelihood and Prior we calculate the Posterior Probability. And then we assume that the attributes are independent hence likelihood is expanded as

The above equation is shown for two attributes, however, can be extended for more. So for our specific scenario, the equation get’s changed to the following. It is shown only for Book=’ML’, it will be done similarly for Book =’Sports’.

Let’s use the famous Flu dataset for naive Bayes and import it, you can change the path. You can download the data from here.

**Importing Data:**

`nbflu=pd.read_csv('/kaggle/input/naivebayes.csv')`

**Encoding the Data:**

We store the columns in different variables and encode the same

# Collecting the Variables

x1= nbflu.iloc[:,0]

x2= nbflu.iloc[:,1]

x3= nbflu.iloc[:,2]

x4= nbflu.iloc[:,3]

y=nbflu.iloc[:,4]#Encoding the categorical variables

le = preprocessing.LabelEncoder()

x1= le.fit_transform(x1)

x2= le.fit_transform(x2)

x3= le.fit_transform(x3)

x4= le.fit_transform(x4)

y=le.fit_transform(y)# Getting the Encoded in Data Frame

X = pd.DataFrame(list(zip(x1,x2,x3,x4)))

**Fitting the Model:**

In this step, we are going to first train the model, then predict for a patient

`model = CategoricalNB()`*# Train the model using the training sets*

model.fit(X,y)

*#Predict Output*

*#['Y','N','Mild','Y']*

predicted = model.predict([[1,0,0,1]])

print("Predicted Value:",model.predict([[1,0,0,1]]))

print(model.predict_proba([[1,0,0,1]]))

**Output:**

`Predicted Value: [1]`

[[0.30509228 0.69490772]]

The output tells the probability of not Flu is 0.31 and Flu is 0.69, hence the conclusion will be Flu.

**Conclusion**:

Naive Bayes works very well as a baseline classifier, it’s fast, can work on less number of training examples, can work on noisy data. One of the challenges is it assumes the attributes to be independent.

**Reference:**

[1] Wu X, Kumar V, editors. The top ten algorithms in data mining. CRC Press; 2009 Apr 9.

[2] https://towardsdatascience.com/all-about-naive-bayes-8e13cef044cf