Generic filters
Exact matches only

# Algorithms from Scratch: Logistic Regression | by Kurtis Pykes | Jul, 2020

## Chunking the Algorithm

1. Randomly initialize parameters for the hypothesis function
2. Apply Logistic function to linear hypothesis function
3. Calculate the Partial Derivative (Saket Thavanani wrote a good post on this titled The derivative of Cost function for Logistic Regression)
4. Update parameters
5. Repeat 2–4 for n number of iterations (Until cost function is minimized otherwise)
6. Inference

Implementation

For this section I leverage 3 Python Frameworks: NumPy for Linear Algebra, Pandas for Data Manipulation and Scikit-Learn for Machine Learning tools.

`import numpy as np import pandas as pd from sklearn.metrics import accuracy_scorefrom sklearn.datasets import load_breast_cancerfrom sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split`

First, we need a dataset. I use `sklearn.datasets.load_breast_cancer` which is a classic binary classification dataset — See Documentation.

`# loading the data setdataset = load_breast_cancer(as_frame=True)df= pd.DataFrame(data= dataset.data)df["target"] = dataset.targetdf.head()`

Next, we split the predictors and the response variables then create a training and test set.

`# Seperating to X and Y X = df.iloc[:, :-1]y = df.iloc[:, -1]# splitting training and testX_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.75, shuffle=True, random_state=24)`

Plenty of the work we done to build Linear Regression from scratch (See link below) can borrowed with a few slight changes to adjust our model for classification using Logistic Regression.

`def param_init(X): """    Initialize parameters    __________________     Input(s)    X: Training data    __________________    Output(s)    params: Dictionary containing coefficients    """params = {} # initialize dictionary _, n_features = X.shape # shape of training data# initializing coefficents to 0 params["W"] = np.zeros(n_features)params["b"] = 0return paramsdef get_z(X, W, b): """    Calculates Linear Function    __________________    Input(s)    X: Training data    W: Weight coefficients    b: bias coefficients    __________________    Output(s)    z: a Linear function    """z = np.dot(X, W) + breturn zdef sigmoid(z):"""    Logit model    _________________    Input(s)    z: Linear model     _________________    Output(s)    g: Logit function applied to linear model    """g = 1 / (1 + np.exp(-z))return gdef gradient_descent(X, y, params, lr, n_iter): """    Gradient descent to minimize cost function    __________________     Input(s)    X: Training data    y: Labels    params: Dictionary contatining coefficients    lr: learning rate    __________________    Output(s)    params: Dictionary containing optimized coefficients    """W = params["W"] b = params["b"]m = X.shape # number of training instances for _ in range(n_iter): # prediction with random weightsg = sigmoid(get_z(X, W, b))# calculate the lossloss = -1/m * np.sum(y * np.log(g)) + (1 - y) * np.log(1-g)# partial derivative of weights dW = 1/m * np.dot(X.T, (g - y))db = 1/m * np.sum(g - y)# updates to coefficientsW -= lr * dWb -= lr * db params["W"] = Wparams["b"] = breturn paramsdef train(X, y, lr=0.01, n_iter=1000):"""    Train Linear Regression model with Gradient decent    __________________     Input(s)    X: Training data    y: Labels    lr: learning rate    n_iter: Number of iterations     __________________    Output(s)    params: Dictionary containing optimized coefficients    """ init_params = param_init(X)params = gradient_descent(X, y, init_params, lr, n_iter)return paramsdef predict(X_test, params):"""    Train Linear Regression model with Gradient decent    __________________     Input(s)    X: Unseen data    params: Dictionary contianing optimized weights from training    __________________    Output(s)    prediction of model    """  z = np.dot(X_test, params["W"]) + params["b"]y_pred = sigmoid(z) >= 0.5return y_pred.astype("int")`

Notable differences are that we now apply a logit function to our linear model, on inference we make every output greater than 0.5 from our logit model to be classified as class one (class 0 otherwise), and we use a different cost function to work for our classification model, since MSE would make our loss function non-convex— To learn more about the cost function used then you should definitely read The derivative of Cost function for Logistic Regression.

`params = train(X_train, y_train) # train modely_pred = predict(X_test, params) # inferencelr = LogisticRegression(C=0.01)lr.fit(X_train, y_train)sklearn_y_pred = lr.predict(X_test)print(f"My Implementation: {accuracy_score(y_test, y_pred)}nSklearn Implementation: {accuracy_score(y_test, sklearn_y_pred)}")>>>> My Implementation: 0.9300699300699301Sklearn Implementation: 0.9300699300699301`

Great, we obtain the same accuracy as the Scikit-Learn implementation.

Now, we will repeat this with Object oriented programming which is considered to be much better for collaboration.

`class LogReg(): """    Custom made Logistic Regression class    """def __init__(self, lr=0.01, n_iter= 1000): self.lr = lrself.n_iter = n_iter self.params = {}def param_init(self, X_train): """        Initialize parameters         __________________         Input(s)        X: Training data        """_, n_features = self.X.shape # shape of training data# initializing coefficents to 0 self.params["W"] = np.zeros(n_features)self.params["b"] = 0return selfdef get_z(X, W, b): """        Calculates Linear Function        __________________        Input(s)        X: Training data        W: Weight coefficients        b: bias coefficients        __________________        Output(s)        z: a Linear function        """z = np.dot(X, W) + breturn zdef sigmoid(z):"""        Logit model        _________________        Input(s)        z: Linear model         _________________        Output(s)        g: Logit function applied to linear model        """g = 1 / (1 + np.exp(-z))return g def gradient_descent(self, X_train, y_train): """        Gradient descent to minimize cost function        __________________         Input(s)        X: Training data        y: Labels        params: Dictionary contatining random coefficients        alpha: Model learning rate        __________________        Output(s)        params: Dictionary containing optimized coefficients        """W = self.params["W"] b = self.params["b"] m = X_train.shapefor _ in range(self.n_iter): # prediction with random weightsg = sigmoid(get_z(X, W, b))# calculate the lossloss = -1/m * np.sum(y * np.log(g)) + (1 - y) * np.log(1 - g)# partial derivative of weights dW = 1/m * np.dot(X.T, (g - y))db = 1/m * np.sum(g - y)# updates to coefficientsW -= self.lr * dWb -= self.lr * db self.params["W"] = Wself.params["b"] = breturn selfdef train(self, X_train, y_train):"""        Train model with Gradient decent        __________________         Input(s)        X: Training data        y: Labels        alpha: Model learning rate        n_iter: Number of iterations         __________________        Output(s)        params: Dictionary containing optimized coefficients        """ self.params = param_init(X_train)gradient_descent(X_train, y_train, self.params , self.lr, self.n_iter)return self def predict(self, X_test):"""        Inference         __________________         Input(s)        X: Unseen data        params: Dictionary contianing optimized weights from training        __________________        Output(s)        y_preds: Predictions of model        """  g = sigmoid(np.dot(X_test, self.params["W"]) + self.params["b"])return g`

To check if we implemented it correctly we can see if the predictions are the same as our procedural implementation as we already know this is approximately equal to Scikit-learn’s implementation.

`logreg = LogReg()logreg.train(X_train, y_train)oop_y_pred = logreg.predict(X_test)oop_y_pred == y_preds`

This returns an array that is True for each value.