Best-fit line in Linear Regression

Best Fit Line in Linear Regression

In this blog, let’s understand the Simple Linear Regression and how to find the best fit line in Linear Regression Line.

Simple Linear Regression is the line where the dependent/output variable is based on only one independent variable.

For example:

y=β0+β1x

In the above example, the dependent variable(y) is dependent upon only one independent variable(x).

β0 and β1 are coefficients.

Let’s take a simple example where a manager of an online shopping company wants to predict the sales number in the current month based on money spent on advertising. To predict, the manager needs previous moths data about the number of sales and money spent.

ad_spending Sales (In thousands)
1000 139
1500 155
1800 160
2000 210
800 120
1200 150
1400 145
1700 164
1100 125

Upon plotting, we can observe the trend. The sales in the online platform are increasing based on the amount spent on advertising.

Clearly, we cannot put a linear line on each data point, however, we will try to fit a line which is the best fit for these data points.

bestfit line

Now we can predict the sales volume if the advertising budget is 1600 USD. The sales number will be around 166.

B7hyi04M5u24AAAAAElFTkSuQmCC

Best-fit line in Linear Regression:

There could be multiple straight lines which can fit these data point. So now the important question is, which is the best-fit line and how to find it. Before finding the best-fit line let’s understand a few other important concepts.

Residual:

Residual is basically the difference of predicted value from actual values. In another term, we can say the error between the predicted value and actual values. If you observe the below diagram, you will notice that each point is not accurately classified. So, at each data point, the best fit line gives the predicted value.

So, we can write the residue as below:

The residue should be calculated for each data point. We can rewrite the error term as below.

pred vs actual

To find the best fit line we have to calculate the residual sum of squares.

AVbLi7jQUW0FAAAAAElFTkSuQmCC

This can be rewritten as below:

RSS

To identify the best fit line in linear regression we have to find the line which has minimum Residual Sum of Square (RSS). To find minimize RSS function to find optimal β0 and  β1.

Before understanding how to find β0 and  β1, let’s understand the concept of Cost Function in the next chapter. Follow this link to learn more about Linear Regression.

Leave a Reply