Variance Inflation Factor(VIF)

Variance Inflation Factor / What is VIF / VIF / VIF in python

In this blog, we will discuss the Variance Inflation Factor (VIF), why VIF is required and will implement the concept of  VIF in python.

Multi-collinearity is a state where multiple dependent attributes correlated to each other. In other words, one attribute is somehow directly or indirectly related to other attributes and they provide similar predictive power to the model. The variance inflation factor is a method to find multicollinearity among attributes.

Correlation coefficient is a measure of multi-collinearity but this can find a correlation between only two variables. Therefore Variance Inflation Factor(VIF) metric used to measure the collinearity among multiple variables.

The formula for finding the Variance Inflation Factor (VIF) of any attribute is below:

ACzHwk+JEk4EAAAAAElFTkSuQmCC

The VIF of each attribute calculated by running a multiple regression model where the attribute is the dependent variable and the other are independent variables.

For example, to find VIF of x1 from the set of dependent variables such as {x1,x2,x3 }, a multiple linear regression model built, where x1 acts as a dependent variable whereas x2 and x3 act as independent variables. VIF determined, based on the R-sq value for each variable.

Higher the VIF, the Higher the multi-collinearity between other variables. As a rule of thumb, you should remove VIF values more than 10 to avoid multi-collinearity issues in the model.

 def vif(input_data, dependent_col):
    vif_df = pd.DataFrame( columns = ['Var', 'Vif'])
    x_vars=input_data.drop([dependent_col], axis=1)
    xvar_names=x_vars.columns
    for i in range(0,xvar_names.shape[0]):
        y=x_vars[xvar_names[i]] 
        x=x_vars[xvar_names.drop(xvar_names[i])]
        rsq=sm.OLS(y,x).fit().rsquared  
        vif=round(1/(1-rsq),2)
        vif_df.loc[i] = [xvar_names[i], vif]
    return vif_df.sort_values(by = 'Vif', axis=0, ascending=False, inplace=False)

# Calculating Vif value
vif(input_data=advertising_multi, dependent_col="price") 

Output:

	
        Var 	   Vif
1 	Radio 	   3.29
2 	Newspaper  3.06
0 	TV 	   2.49

Follow this link to find detailed usage of VIF.

Conclusion:

VIF is very useful when the user needs to eliminate unnecessary variables from a huge list of variables. Note that VIF is not the same as PCA as PCA is not most effective to find multi-colinearity.

Leave a Reply