Logistic Regression Model Training

Logistic Regression Training

In this module, we will see the use of Logistic regression training using the Scikit-learn library. We will use a sample data set called ‘pima_indian_diabetes.csv‘ to run our model.

Load the diabetes dataset.

 
import pandas as pd
import numpy as np

pima = pd.read_csv('pima_indian_diabetes.csv')

Verify dataset.

 
pima.head() 
No_Times_Pregnant Plasma_Glucose Diastolic_BP Triceps Insulin BMI Age Diabetes
1 89 66 23 94 28.1 21 0
0 137 40 35 168 43.1 33 1
3 78 50 32 88 31.0 26 1
2 197 70 45 543 30.5 53 1
1 189 60 23 846 30.1 59 1

Normalizing continuous features:

Normalization is required to keep all the values on the same scale.

 
df = pima[['No_Times_Pregnant', 'Plasma_Glucose', 'Diastolic_BP', 'Triceps','Insulin', 'BMI', 'Age']]
normalized_df=(df-df.mean())/df.std()
pima = pima.drop(['No_Times_Pregnant', 'Plasma_Glucose', 'Diastolic_BP', 'Triceps','Insulin', 'BMI', 'Age'], 1)
pima = pd.concat([pima,normalized_df],axis=1)
pima.head()

Output:

Diabetes No_Times_Pregnant Plasma_Glucose Diastolic_BP Triceps Insulin BMI Age
0 -0.716511 -1.089653 -0.373178 -0.584363 -0.522175 -0.709514 -0.967063
1 -1.027899 0.465719 -2.453828 0.556709 0.100502 1.424909 0.209318
1 -0.093734 -1.446093 -1.653578 0.271441 -0.572662 -0.296859 -0.476904
1 -0.405123 2.409934 -0.053078 1.507603 3.255961 -0.368007 2.169953
1 -0.716511 2.150705 -0.853328 -0.584363 5.805571 -0.424924 2.758143

Split the data into train and test data.

 
from sklearn.model_selection import train_test_split

# Putting feature variable to X
X = pima.drop(['Diabetes'],axis=1)

# Putting response variable to y
y = pima['Diabetes']
# Splitting the data into train and test
X_train, X_test, y_train, y_test = train_test_split(X,y, train_size=0.7,test_size=0.3,random_state=100) 

Train the data.

Using the Scikit-learn library.

 from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

Using the statsmodel library.

 
import statsmodels.api as sm
logm1 = sm.GLM(y_train,(sm.add_constant(X_train)), family = sm.families.Binomial())
logm1.fit().summary() 

Validate the model:

The below example used the ‘confusion matrix‘ metric to validate the model.

 
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)

Output:

[[68 12]
 [16 22]]

Conclusion:

In this blog, you learned how to create a model with basic Logistic Regression Training. In the next blog, you will find more about confusion-matrix, sensitivity, and specificity.

One thought on “Logistic Regression Model Training

Leave a Reply