Skip to content
Search
Generic filters
Exact matches only

Importance and Functions of Kernel in Machine Learning

What does the first thing come into your mind when you read or listen to the word kernel? In my mind, it’s a post in the army or as computer scientist its operating system kernel that is responsible for the operation and manages the hardware according to given instructions. But Kernel in Machine Learning is something else, its somehow like operating system kernel that manage the function learned by some model/trainer with some experience/examples/data points. Now, here is the question of what actually machine learning is? What is the model/trainer? And what is experience/examples/data points? Here we will try to answer all these questions.

Basic concepts related to machine learning:

Machine learning is study in which we give ability to learn without and explicit programming.

Figure 1 : Machine Learning Vs Traditional Programming

If we go for the basic and formal definition of machine
learning then it is

”A computer program is said
to learn from experience E with respect to some task T
and some performance measure P, if its performance on T, as
measured by P, improves with experience E.” — Tom Mitchell, Carnegie Mellon
University”

So, if you want to train your program to predict the gender
e.g male/female(Task T), you can train it through the giving different
examples that distinguish male and female(experience E). if you model
learn and train on the given examples then it can predict gender in the future
(performance measure P).Kernel in machine learning

To explain the kernels and their functionality, kernel of
Support Vector Machine (SVM) is the best way. In simple words, we can say that
kernel is like the similarity function in the field of machine learning. If we
give two objects to any classifier kernel classify them on the basis of some
similarity score. the objects to classify can be anything like two simple
integers, any kind of text, integer vector, images, or any entity in the real
world. It is the responsibility of kernel to define a function or associate a
relationship between them from experiences to classify them. The simplest basic
example of kernel in machine learning is a linear kernel of SVM, in simple
words, you can say dot-product. Linear kernel associates the relationship
between two vectors on the basis of projection length. Another example of
kernel is the Gaussian kernel, that uses the radius parameters to reweight the
distance between two vectors X and Y to classify.

Importance of Kernel in Machine Learning:

The decision to classify an
example depends on the decision function and a kernel can not be said decision
function. Decision function uses the kernel inside and compares the example to
the number of support vectors weights by using the learned parameters α. So, we can say that kernel is
just a weighting factor that assigns weights to the examples/data
points. It is up to kernel it can assign more weight to one example at one time
and less weight another time or can assign more weight to other example and so
on. Another function of the kernel is to change the dimension of data
according to the situation.
It also maps the one data to another in a
one-to-one manner according to given criteria such as missing data or
reordering data etc. Actually, it is the responsibility of kernel to crop,
stretch, expand, bend or shrink the data sequence to map one-to-one on other
data.

Example:

a = (a1, a2, a3); b = (b1, b2, b3).

Then for the function f(a) = (a1x1, a1x2, a1x3, a2x1, a2x2, a2x3, a3x1,
a3x2, a3x3),

the kernel is K(a, b ) = (<a, b>)2.

Let’s add some more example to make it clearer:

suppose a = (1, 2, 3); b = (4, 5, 6). Then:
f(a) = (1, 2, 3, 2, 4, 6, 3, 6, 9)
f(b) = (16, 20, 24, 20, 25, 30, 24, 30, 36)
<f(a), f(b)> = 16 + 40 + 72 + 40 + 100+ 180 + 72 + 180 + 324 = 1024

To calculate the results and find the relationship between input
and output we have to do a lot of algebra and should have critical analysis
skills. This all is mainly because f is a mapping from three dimensional to nine-dimensional
space.

Now let see the magic of kernel:
K(x, y) = (4 + 10 + 18 ) 2 = 322 = 1024
by using the kernel we got the same result, but this calculation is so much faster
and easy.

This is how kernel make our life easy. Sometime there is one input against output and it is easy to make the relationship between input and output as in figure 1.

Figure 2 : Function Learning on single input

But sometime there is more than one input vectors and function learning in that case is difficult. Example is shown in Figure 2.

Figure 3 : Function Learning on multiple input values

Amazing functionalities of kernel:

This is the beauty of kernel is that it allows doing
classification in infinite dimensions without letting us know the pressure upon
it. But keep in mind that it is not possible every time higher dimension data
is difficult to classify and sometimes it is not possible for the kernel to
make rules or learn function for this big data. In machine learning, higher
dimension data lead to lower results that is called the curse of
dimensionality. Function F(x) can map high dimension data to infinite
dimensions only when it makes sense and have idea to deal with it. In such
cases kernel gives amazing shortcuts to deal with data.

SVM kernels and their suitability:

 There are some
different types of kernels that can be used with SVM and perform well according
to the nature of data. Following are kernel types:

  • Linear kernel: suitable for large sparse data.
  • Non-linear kernel: suitable for converting
    non-linear separable high dimension data in to linear form  
  • Polynomial kernel: popular in Digital Image
    Processing (DIP)
  • Radial basis function (RBF) kernel: suitable
    where no prior knowledge about data
  • Sigmoid kernel: used in Artificial Neural
    Network (ANN)

Here we use Linear kernel as an
example and code to plot that shows how linearly separable data is classified
behind the scene. It is easy to separate the linear data and classify that for
the kernel. The output of linear data is shown in figure4

from
sklearn.datasets.samples_generator import make_blobs
from sklearn import svm
import matplotlib.pyplot as plot
A,b=make_blobs(n_samples=60,centers=2,random_state=20)
svm=svm.SVC(kernel=’linear’,C=1)
svm.fit(A,b)
plot.scatter(A[:,0],A[:,1],c=b, s=30, cmap= plt.cm.Paired)
plot.show()

Figure 4 : Linearly Separable Data

In the case of non-linear data,
it is difficult to classify. So, as we have discussed above that it is the
responsibility of kernel to transform the one-dimension data into two or three
dimensions so that classification can be easily by separating the non-linear
data. 

from mpl_toolkits.mplot3d
import Axes3D
from sklearn.datasets import make_circles
import matplotlib.pyplot as plot
A, b = make_circles(n_samples = 500, noise = 0.02) 
plot.scatter(A[:, 0], A[:, 1], c = b, marker = ‘.’)
plot.show()

Figure 5 : Non-Linear Data Distribution

In figure 5 data is non-linearly distribute and it is difficult to separate of draw boundary for kernel but in 3-D it will be easy so kernel converts it in 3-D form and it looks like same as in figure 6.

Figure 6 : Non-linear Distributed in 3-D View

In three dimensions it is easy to separate the
non-linear data and it is the functionality of non-linear kernel to convert
data in a linear distribution.

The successful classification of the model
depends on choosing the right kernel for the right scenario. Every kernel can
not perform well or fit in every kind of data. It is problem specific and you
have to choose the kernel wisely according to your needs and nature of
data. 

Description:

Kernels are the most important part of any operating system or model in machine learning. Kernel in Machine Learning used to handle the decision function of machine learning models. Decision function uses the kernel inside and compares the example to number of support vectors weights by using the learned parameters α SVM provides a different kind of kernels such as the linear kernel, nonlinear kernel, RBF kernel, sigmoid kernel. Every kernel has its own functionality, pros and cons, and nature of work.