# Variance Inflation Factor(VIF)

## Variance Inflation Factor / What is VIF / VIF / VIF in python

In this blog, we will discuss the Variance Inflation Factor (VIF), why VIF is required and will implement the concept of  VIF in python.

Multi-collinearity is a state where multiple dependent attributes correlated to each other. In other words, one attribute is somehow directly or indirectly related to other attributes and they provide similar predictive power to the model. The variance inflation factor is a method to find multicollinearity among attributes.

Correlation coefficient is a measure of multi-collinearity but this can find a correlation between only two variables. Therefore Variance Inflation Factor(VIF) metric used to measure the collinearity among multiple variables.

The formula for finding the Variance Inflation Factor (VIF) of any attribute is below: The VIF of each attribute calculated by running a multiple regression model where the attribute is the dependent variable and the other are independent variables.

For example, to find VIF of x1 from the set of dependent variables such as {x1,x2,x3 }, a multiple linear regression model built, where x1 acts as a dependent variable whereas x2 and x3 act as independent variables. VIF determined, based on the R-sq value for each variable.

Higher the VIF, the Higher the multi-collinearity between other variables. As a rule of thumb, you should remove VIF values more than 10 to avoid multi-collinearity issues in the model.

``` def vif(input_data, dependent_col):
vif_df = pd.DataFrame( columns = ['Var', 'Vif'])
x_vars=input_data.drop([dependent_col], axis=1)
xvar_names=x_vars.columns
for i in range(0,xvar_names.shape):
y=x_vars[xvar_names[i]]
x=x_vars[xvar_names.drop(xvar_names[i])]
rsq=sm.OLS(y,x).fit().rsquared
vif=round(1/(1-rsq),2)
vif_df.loc[i] = [xvar_names[i], vif]
return vif_df.sort_values(by = 'Vif', axis=0, ascending=False, inplace=False)

# Calculating Vif value

Output:

``````
Var 	   Vif