Skip to content
Search
Generic filters
Exact matches only

A Naive Bayes approach towards creating closed domain Chatbots!

Machine learning | Natural language processing

A chatbot is a piece of software designed to carry out a conversation or discussion, as the name implies. They are found in a wide range of industries to fulfill a variety of functions, ranging from offering customer service to assisting in treatment to being simply a source of fun.

Dhruvil Shah
Photo by Alex Knight on Unsplash

Have you ever experienced seeking customer support help for making an inquiry or paying bills or maybe while ordering food online? The chances are good that your first interaction was with a chatbot. Many companies use rule-based or retrieval-based chatbots that are trained for a specific subject, these are called closed domain chatbots. For example, if you want to check the status of your flight you may have a chat like this :

In the conversation above, chatbot gave a reply that made the most sense to it or that was very similar to the question asked. This chatbot is specifically trained for handling customer reports or inquiries about flights.

A rule-based chatbot uses RE (Regular Expression) patterns for matching the input user response to the responses it is trained on for carrying the conversation with a person when a retrieval-based chatbot uses Intent Classification, Entity Recognition and Response Selection for carrying out a real conversation.

There is an easy and efficient approach for creating a closed domain chatbot that uses the Naive Bayes classifier. In this approach a closed domain dataset containing questions/user-responses and corresponding answers is made, in which each question/user-response is given a label, this label relates the question to its answer. Because of multiple questions could have the same response, there can be multiple questions having the same answer. To clear the clouds let’s look at the example.

hi there 1
hello how are you 1
what is your name 2
who are you 2
you are who 2
my name is 2
how old are you 3
what is your age 3
are you getting older 3
what about your age 3

The labels/digits at the end are nothing but the index of answers in our answers dataset.

Hi there, how are you !?
My name is etcetera, but you can call me etc.
I'm 22 years old

The first two questions contain the label ‘1’ so they refer to the response ‘Hi there, how are you !?’. Similarly, the last four questions refer to ‘I’m 22 years old’.

The notion here is that the Naive Bayes classifier will predict the label based on the input we give it. So when you say ‘hi’ our classifier will predict the label ‘1’, which in return we can use to find a suitable answer. When the input is ‘what’s your age?’ classifier will predict the label ‘3’, which is an index of the answer ‘I’m 22 years old’.

The below figure will dissolve any misgivings you have about the notion.

Figure explaining the flow of the system

Preparing training data

If we have our questions formatted as above in a file called ‘que.txt’ and answers in a file ‘ans.txt’, we can prepare separate lists containing questions, labels, and answers with the help of the code below:

labels = []
questions = []
for line in open('que.txt', encoding="utf8"):
labels.append(line.strip().split(" ")[-1])
que.append(" ".join(line.strip().split(" ")[:-1]))
answers = []
for line in open('ans.txt', encoding="utf8"):
ans.append(line.strip())

Each label is at the end of the question and we can get it as shown above. After this, we will have all the questions for training in a list called ‘questions’ and the labels of those questions in a list ‘labels’. Remember, the labels of questions relate them to their answers. In the ‘answers’ list all the possible answers will be stored.

Note: Here, label numbers at the end of a question start from 1 but to map them to answers in the ‘answers’ list we will have to do a subtraction of 1 from the number because python list indexing starts from 0.

Now, we need to convert our training questions to vectors to feed them to our classifier. For this purpose, we are going to use CountVectorizer from the machine learning library sci-kit learn. Below is the code snippet for doing this:

from sklearn.feature_extraction.text import CountVectorizer
bow_vectorizer = CountVectorizer()
training_vectors = bow_vectorizer.fit_transform(questions)

‘Bow’ in bow_vectorizer stands for Bag-of-words. The .fit_transform() method of CountVectorizer() does two things: 1.) trains the features dictionary — dictionary of all the unique words in the training corpus 2.) transforms each question into vectors of a size of features dictionary that contains zeros in all the places except for the words used in the question

If you didn’t understand the above explanation, let’s take an example.

#our sentence
sentence = “Help my fly fish fly away”
#example features dictionary based on the training corpus
features_dictionary = {'all':0,'my':1, 'fish':2, 'fly':3, 'away':4, 'help':5, 'me':6, 'open':7}
#Our vector for the sentence above will be
vector = [0, 1, 1, 2, 1, 1, 0, 0]
#size of vector = size of features dictionary

The feature dictionary contains all the unique words from the training corpus as keys and their index as values. The word ‘fly’ appeared twice in our sentence. If we look at the feature dictionary for ‘fly’, we find that its index is 3. So, when we look at our vector we’d expect the number at index 3 to be ‘2’. As shown above the CountVectorizer() gives us the vectors for our questions we feed.

Note: Here, we are training our classifier on a very small dataset containing 10 answers and 37 questions only to keep things simple. The size may differ according to various applications.

Getting to the training part

Now that we have our training vectors it’s time we create our Naive Bayes classifier. For this, we will use sci-kit learn’s MultinomialNB().

from sklearn.naive_bayes import MultinomialNB
classifier = MultinomialNB()
classifier.fit(training_vectors, labels)

We train our classifier using .fit() method, passing in the training vectors and corresponding labels.

Now, our classifier is ready to make predictions. We can take input from the user, convert it into the vector with the help of vectorizer we created earlier, and get predictions of the label from our classifier.

Creating a Chatbot class

It is time we create a chatbot class that comprises all methods to carry out a conversation. start_chat() method is to start the conversation. chat() method checks for exit commands like ‘quit’ or ‘bye’ and stops the conversation if found any of those words in a user’s response.

class ChatBot:
exit_commands = ("quit", "pause", "exit", "goodbye", "bye", "later", "stop")
def start_chat(self):
user_response = input("Hi, I'm a chatbot trained on random dialogs!!n")
self.chat(user_response)

def chat(self, reply):
while not self.make_exit(reply):
reply = input(self.generate_response(reply)+"n")
return

def generate_response(self, sentence):
input_vector = bow_vectorizer.transform([sentence])
predict = classifier.predict(input_vector)
index = int(predict[0])
print("Accurate:",str(classifier.predict_proba(input_vector)[0][index-1] * 100)[:5] + "%")
return answers[index-1]

def make_exit(self, reply):
for exit_command in self.exit_commands:
if exit_command in reply:
print("Ok, have a great day!")
return True
return False

Let’s create an instance of our ChatBot class and test our work.

etcetera = ChatBot()
etcetera.start_chat()

When we call .start_chat() eventually call goes to .chat() for checking of exit words and then to .generate_response(). .generate_response() method takes user response as input, converts it to a vector-based on the training corpus, and feeds it to our classifier to get the prediction of a label which in turn will give us the index of the answer to be given to the user. Remember, here we have to access the answers list by index minus one.

Conversation with the chatbot

The above GIF shows how chatbot responds with the weather inquiry. In the training data, there is a user response/question — “tell me today’s weather report 8” which contains label ‘8’. In the answers dataset on line 8 is the response — “Today temperature will be 37 degrees Celsius and sky will be clear all day”. It is obvious that this response is hardcoded here, there can be functions to do the job of entity recognition from the user response (‘weather’) and getting the real-time data from the web using available APIs.

Our chatbot is ready. The output above also shows how accurate the prediction is but for this project as a minimal dataset was used, the accuracy is relatively low. Still, it works just okay. Companies can simply adopt this system by replacing their larger closed domain dataset.

GitHub’s link for this project is this. You can get all of the above code from there and you can find me here on LinkedIn.

Intent Classification and Entity Recognition can be integrated into this approach to serve the applications like paying bills or suggesting songs based on different genres or providing the status of the flight specified by the user. Part-of-speech (POS) tagging could be used for this.

This approach can not be applied to open-domain applications where user response can vary to various applications. This approach is strictly for the closed-domain applications where the responses can be determined beforehand and we can map the questions to them.

To conclude, chatbots can be built using the Naive Bayes approach in which we just have to give each training question a corresponding label number of the answer and our classifier will predict the label of the user input. This label number is an index of a corresponding response in our answers’ dataset. This way, we do not need to worry much about Regular expression pattern matching like in regular rule-based chatbot approach. The chatbot built above is trained on a very smaller dataset as this is supposed to be a simple explanation of the new approach. The companies may create their datasets according to the specific application.

For a production system, you will want to consider one of the existing bot frameworks and integrating this approach into it; this way, you do not have to start from scratch. The Internet is full of resources and after reading this article you may want to create a chatbot of your own.