Logistic Regression Model Training
Logistic Regression Training
In this module, we will see the use of Logistic regression training using the Scikit-learn library. We will use a sample data set called ‘pima_indian_diabetes.csv‘ to run our model.
Load the diabetes dataset.
import pandas as pd import numpy as np pima = pd.read_csv('pima_indian_diabetes.csv')
Verify dataset.
pima.head()
No_Times_Pregnant | Plasma_Glucose | Diastolic_BP | Triceps | Insulin | BMI | Age | Diabetes |
1 | 89 | 66 | 23 | 94 | 28.1 | 21 | 0 |
0 | 137 | 40 | 35 | 168 | 43.1 | 33 | 1 |
3 | 78 | 50 | 32 | 88 | 31.0 | 26 | 1 |
2 | 197 | 70 | 45 | 543 | 30.5 | 53 | 1 |
1 | 189 | 60 | 23 | 846 | 30.1 | 59 | 1 |
Normalizing continuous features:
Normalization is required to keep all the values on the same scale.
df = pima[['No_Times_Pregnant', 'Plasma_Glucose', 'Diastolic_BP', 'Triceps','Insulin', 'BMI', 'Age']] normalized_df=(df-df.mean())/df.std() pima = pima.drop(['No_Times_Pregnant', 'Plasma_Glucose', 'Diastolic_BP', 'Triceps','Insulin', 'BMI', 'Age'], 1) pima = pd.concat([pima,normalized_df],axis=1) pima.head()
Output:
Diabetes | No_Times_Pregnant | Plasma_Glucose | Diastolic_BP | Triceps | Insulin | BMI | Age |
---|---|---|---|---|---|---|---|
0 | -0.716511 | -1.089653 | -0.373178 | -0.584363 | -0.522175 | -0.709514 | -0.967063 |
1 | -1.027899 | 0.465719 | -2.453828 | 0.556709 | 0.100502 | 1.424909 | 0.209318 |
1 | -0.093734 | -1.446093 | -1.653578 | 0.271441 | -0.572662 | -0.296859 | -0.476904 |
1 | -0.405123 | 2.409934 | -0.053078 | 1.507603 | 3.255961 | -0.368007 | 2.169953 |
1 | -0.716511 | 2.150705 | -0.853328 | -0.584363 | 5.805571 | -0.424924 | 2.758143 |
Split the data into train and test data.
from sklearn.model_selection import train_test_split # Putting feature variable to X X = pima.drop(['Diabetes'],axis=1) # Putting response variable to y y = pima['Diabetes'] # Splitting the data into train and test X_train, X_test, y_train, y_test = train_test_split(X,y, train_size=0.7,test_size=0.3,random_state=100)
Train the data.
Using the Scikit-learn library.
from sklearn.linear_model import LogisticRegression model = LogisticRegression() model.fit(X_train, y_train) y_pred = model.predict(X_test)
Using the statsmodel library.
import statsmodels.api as sm logm1 = sm.GLM(y_train,(sm.add_constant(X_train)), family = sm.families.Binomial()) logm1.fit().summary()
Validate the model:
The below example used the ‘confusion matrix‘ metric to validate the model.
from sklearn.metrics import confusion_matrix cm = confusion_matrix(y_test, y_pred) print(cm)
Output:
[[68 12]
[16 22]]
Conclusion:
In this blog, you learned how to create a model with basic Logistic Regression Training. In the next blog, you will find more about confusion-matrix, sensitivity, and specificity.
appropriate article