Skip to content
Search
Generic filters
Exact matches only

Algorithms from Scratch: Logistic Regression | by Kurtis Pykes | Jul, 2020

Chunking the Algorithm

  1. Randomly initialize parameters for the hypothesis function
  2. Apply Logistic function to linear hypothesis function
  3. Calculate the Partial Derivative (Saket Thavanani wrote a good post on this titled The derivative of Cost function for Logistic Regression)
  4. Update parameters
  5. Repeat 2–4 for n number of iterations (Until cost function is minimized otherwise)
  6. Inference

Implementation

For this section I leverage 3 Python Frameworks: NumPy for Linear Algebra, Pandas for Data Manipulation and Scikit-Learn for Machine Learning tools.

import numpy as np 
import pandas as pd
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

First, we need a dataset. I use sklearn.datasets.load_breast_cancer which is a classic binary classification dataset — See Documentation.

# loading the data set
dataset = load_breast_cancer(as_frame=True)
df= pd.DataFrame(data= dataset.data)
df["target"] = dataset.target

df.head()

Figure 5: Output of code cell above. Note: The DataFrame has 31 columns which is too large to display hence the ellipses (still some columns cannot be seen).

Next, we split the predictors and the response variables then create a training and test set.

# Seperating to X and Y 
X = df.iloc[:, :-1]
y = df.iloc[:, -1]

# splitting training and test
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.75, shuffle=True, random_state=24)

Plenty of the work we done to build Linear Regression from scratch (See link below) can borrowed with a few slight changes to adjust our model for classification using Logistic Regression.

def param_init(X): 
"""
Initialize parameters
__________________
Input(s)
X: Training data
__________________
Output(s)
params: Dictionary containing coefficients
"""
params = {} # initialize dictionary
_, n_features = X.shape # shape of training data

# initializing coefficents to 0
params["W"] = np.zeros(n_features)
params["b"] = 0
return params

def get_z(X, W, b):
"""
Calculates Linear Function
__________________
Input(s)
X: Training data
W: Weight coefficients
b: bias coefficients
__________________
Output(s)
z: a Linear function
"""
z = np.dot(X, W) + b
return z
def sigmoid(z):
"""
Logit model
_________________
Input(s)
z: Linear model
_________________
Output(s)
g: Logit function applied to linear model
"""
g = 1 / (1 + np.exp(-z))
return g
def gradient_descent(X, y, params, lr, n_iter):
"""
Gradient descent to minimize cost function
__________________
Input(s)
X: Training data
y: Labels
params: Dictionary contatining coefficients
lr: learning rate
__________________
Output(s)
params: Dictionary containing optimized coefficients
"""
W = params["W"]
b = params["b"]
m = X.shape[0] # number of training instances

for _ in range(n_iter):
# prediction with random weights
g = sigmoid(get_z(X, W, b))
# calculate the loss
loss = -1/m * np.sum(y * np.log(g)) + (1 - y) * np.log(1-g)
# partial derivative of weights
dW = 1/m * np.dot(X.T, (g - y))
db = 1/m * np.sum(g - y)
# updates to coefficients
W -= lr * dW
b -= lr * db

params["W"] = W
params["b"] = b
return params

def train(X, y, lr=0.01, n_iter=1000):
"""
Train Linear Regression model with Gradient decent
__________________
Input(s)
X: Training data
y: Labels
lr: learning rate
n_iter: Number of iterations
__________________
Output(s)
params: Dictionary containing optimized coefficients
"""
init_params = param_init(X)
params = gradient_descent(X, y, init_params, lr, n_iter)
return params
def predict(X_test, params):
"""
Train Linear Regression model with Gradient decent
__________________
Input(s)
X: Unseen data
params: Dictionary contianing optimized weights from training
__________________
Output(s)
prediction of model
"""
z = np.dot(X_test, params["W"]) + params["b"]
y_pred = sigmoid(z) >= 0.5
return y_pred.astype("int")

Notable differences are that we now apply a logit function to our linear model, on inference we make every output greater than 0.5 from our logit model to be classified as class one (class 0 otherwise), and we use a different cost function to work for our classification model, since MSE would make our loss function non-convex— To learn more about the cost function used then you should definitely read The derivative of Cost function for Logistic Regression.

params = train(X_train, y_train) # train model
y_pred = predict(X_test, params) # inference
lr = LogisticRegression(C=0.01)
lr.fit(X_train, y_train)
sklearn_y_pred = lr.predict(X_test)
print(f"My Implementation: {accuracy_score(y_test, y_pred)}nSklearn Implementation: {accuracy_score(y_test, sklearn_y_pred)}")>>>> My Implementation: 0.9300699300699301
Sklearn Implementation: 0.9300699300699301

Great, we obtain the same accuracy as the Scikit-Learn implementation.

Now, we will repeat this with Object oriented programming which is considered to be much better for collaboration.

class LogReg(): 
"""
Custom made Logistic Regression class
"""
def __init__(self, lr=0.01, n_iter= 1000):
self.lr = lr
self.n_iter = n_iter
self.params = {}

def param_init(self, X_train):
"""
Initialize parameters
__________________
Input(s)
X: Training data
"""
_, n_features = self.X.shape # shape of training data

# initializing coefficents to 0
self.params["W"] = np.zeros(n_features)
self.params["b"] = 0
return self

def get_z(X, W, b):
"""
Calculates Linear Function
__________________
Input(s)
X: Training data
W: Weight coefficients
b: bias coefficients
__________________
Output(s)
z: a Linear function
"""
z = np.dot(X, W) + b
return z

def sigmoid(z):
"""
Logit model
_________________
Input(s)
z: Linear model
_________________
Output(s)
g: Logit function applied to linear model
"""
g = 1 / (1 + np.exp(-z))
return g

def gradient_descent(self, X_train, y_train):
"""
Gradient descent to minimize cost function
__________________
Input(s)
X: Training data
y: Labels
params: Dictionary contatining random coefficients
alpha: Model learning rate
__________________
Output(s)
params: Dictionary containing optimized coefficients
"""
W = self.params["W"]
b = self.params["b"]
m = X_train.shape[0]

for _ in range(self.n_iter):
# prediction with random weights
g = sigmoid(get_z(X, W, b))
# calculate the loss
loss = -1/m * np.sum(y * np.log(g)) + (1 - y) * np.log(1 - g)
# partial derivative of weights
dW = 1/m * np.dot(X.T, (g - y))
db = 1/m * np.sum(g - y)
# updates to coefficients
W -= self.lr * dW
b -= self.lr * db

self.params["W"] = W
self.params["b"] = b
return self

def train(self, X_train, y_train):
"""
Train model with Gradient decent
__________________
Input(s)
X: Training data
y: Labels
alpha: Model learning rate
n_iter: Number of iterations
__________________
Output(s)
params: Dictionary containing optimized coefficients
"""
self.params = param_init(X_train)
gradient_descent(X_train, y_train, self.params , self.lr, self.n_iter)
return self

def predict(self, X_test):
"""
Inference
__________________
Input(s)
X: Unseen data
params: Dictionary contianing optimized weights from training
__________________
Output(s)
y_preds: Predictions of model
"""
g = sigmoid(np.dot(X_test, self.params["W"]) + self.params["b"])
return g

To check if we implemented it correctly we can see if the predictions are the same as our procedural implementation as we already know this is approximately equal to Scikit-learn’s implementation.

logreg = LogReg()
logreg.train(X_train, y_train)
oop_y_pred = logreg.predict(X_test)
oop_y_pred == y_preds

This returns an array that is True for each value.