visit
“Generalization is the ultimate goal of any machine learning algorithm”
Two major types of problems that machine learning algorithms try to solve are:
To generalize, you draw a straight line such that it crosses through the maximum points. Once you get that line, for house of any size you just project that data point over the line which gives you the house price. That is it. You are done. But wait!
The same old line equation is used: y = mx +c with a little statistical look and some below terminologies are added specific to Linear Regression modeling.
We will go over the elements of the equation one by one:
Seems like we can easily come to best parameter values by hit and trial method and thus finding the best fit line. But things are not that easy when the problem you are solving has more dimensions which is also known as the The Curse of Dimensionality.
Multiple Linear Regression The Multiple Linear Regression will have n number of features and we have to find a line which fits all the data points for all the dimensions. You have started to realize that it is no longer a hit-and-trial solution.
We can draw infinite number of lines but never be sure that the line is the best fit or not, so here enters our savior: The Cost Function
“Mathematics is the queen of Science”I have been running from mathematics my whole life, but you see, you just cannot escape from it. It’s really intuitive and gives you the ability to see what’s going on under the hood. The Cost Function is a mathematical construct which is calculated by adding up the squared error terms
Mean Squared Error By tweaking the above equation a little we get the below equation:
where J is cost notation and θ is the parameter
Our goal is to reduce the Cost function, which in turns improves the accuracy. We can take the parameter values by hit and trial method and calculate MSE for each combination of parameters, but that is not an efficient technique. Seems like there is a calculus solution to this problem.Gradient Descent You can visualize the function as a valley, and you are standing on some random point. Your aim is to reach the bottom most point of the valley and Gradient Descent help you in reaching to the bottom most point of the valley.
Here is how you will do it:
You will see and make out where the slope of the valley is going down, then you start taking steps towards the down going slope. Once you see that every contour around you is higher than where you are standing you claim that you have reached the bottom most point.Here is how Gradient Descent will do it:
It has to know where the slope of the valley is going (and it doesn’t have eyes as you do) so it takes the help of mathematics here. To know the slope of a function at any point you differentiate that point with respect to its parameters, thus Gradient Descent differentiates the above Cost function and comes to know the slope of that point. To go to the bottom most point it has to go in the opposite direction of the slope, i.e. where the slope is decreasing.It has to take small steps to move towards the bottom point and thus learning rate decides the length of step that gradient descent will take.
After every move it validates that the current position is global minima or not. This is validated by the slope of that point, if the slope is zero then the algorithm has reached the bottom most point. After every step, it updates the parameter (or weights) and by doing the above step repeatedly it reaches to the bottom most point.You can now use your Linear Regression model to predict the dependent variable of any unforeseen data point with very high accuracy.
If you are still here me with then you have got the concepts of Linear Regression and you are ready to apply it to solve real world problems which matter.