Skip to content
Search
Generic filters
Exact matches only

Automating Data Science Projects With Jenkins | by Aminu Israel | Aug, 2020

Learn how to automate Data Science code using Jenkins

Aminu Israel

Let’s paint a scenario, you’re working on a Data Science project and at first, you had a model accuracy of 80%, you deploy that application to production serving it as an API using Flask. Then some few days you decide to pick up the project later on, after tuning some of the parameters and adding some more data, you had better accuracy than the previous model built. Now you plan to deploy this model and you have to go through the trouble of building, testing and deploying the model to production again which is a lot of work. In this article, I will show you how we can use a powerful tool called Jenkins to automate this process.

Jenkins is a free and open-source automation server. It helps automate the parts of software development related to building, testing, and deploying, facilitating continuous integration and continuous delivery — Wikipedia

With Jenkins, you can automate and accelerate software delivery processes throughout the entire lifecycle using a vast majority of plugins. For example, you can set up Jenkins to automatically detect code commit in a repository and automatically trigger commands either building a Docker image from a Dockerfile, running unit tests, push an image to a container registry or deploy it to the production server without manually doing anything. I’ll be explaining some basic concept we need to know in order to perform some automation in our Data Science project.

  1. It is Open Source
  2. Easy to use and install
  3. A large number of plugins that fit into a DevOps environment
  4. Spend more time on your code and less time on deployment
  5. Massive community

Jenkins support installation across cross platforms whether if you’re a Windows, Linux or Mac user. You can even install it on a cloud server that supports either PowerShell or Linux instances. To install Jenkins, you can refer to the documentation here.

Jenkins has a lot of amazing features and some are beyond the scope of this article, to get the hang of Jenkins you can check the documentation.

Before we jump into the practical side of things, there are some terms I want to explain which is very important, some of which are:

Jenkins Job

A Jenkins job simply refers to runnable tasks that are controlled by Jenkins. For instance, you can assign a job to Jenkins to perform some certain operations like run “Hello World”, perform unit and integration testing etc. Creating Job is very easy in Jenkins but in a software environment, you may not build a single job but instead, you’ll be doing what is referred to as a pipeline.

Jenkins Pipeline

A pipeline is running a collection of jobs following a particular order or sequence, let me explain this with an example. Suppose I am developing an application on Jenkins and I want to pull the code from a code repository, build the application, test and deploy it to a server. To do this, I will create four jobs to perform each of those processes. So, the first job(Job 1) will pull the code from the repository, the second job(Job 2) would be for building the application, third job(Job 3) would perform unit and integration tests and the fourth job(Job 4) for deploying the code to production. I can use the Jenkins build pipeline plugin to perform this task. After creating the jobs and chaining them in a sequence, the build plugin will then run each of these jobs as a pipeline.

Types of Jenkins Pipeline:

  1. Declarative pipeline: This is a feature that supports the pipeline as a code concept. It makes the pipeline code easier to read and write. This code is written in a Jenkinsfile which can be checked into a source control management system such as Git.
  2. Scripted pipeline: This is one of the old ways of writing the code. Using this method, the pipeline code is written on the Jenkins User Interface instance instead of writing it in a file. Though both these pipelines perform the same function and they use the same scripting language(Groovy).

After talking about the major concepts, let’s build a simple mini project and automate it with Jenkins.

This project contains a trained Machine Learning model that detects sentiments relating to suicidal tweets from twitter which I deployed as an API using flask. I structured my Jenkins pipeline to:

Pull changes from the repository when a commit is made >>> Build Docker Image >>> Push Built Image to DockerHub >>> Remove Unused Docker Images.

Startup a Jenkins server and install Git, Docker, Pipeline and build plugins and also install Git and Docker in your instance also. For this article, I used Jenkins on an AWS EC2 instance.

Push the code to a repository, in this article I used Github. You can find the code for this article here.

My working directory:

Then we need to tell Jenkins to start building the pipeline whenever a change is made in the code repository. To do this, you need to add the Jenkins webhook to the GitHub repository for Github to communicate to Jenkins if there’s a change in the code. To do this:

Click on settings in your code repository