CompartiMOSS - 50 Numeros
El día de ayer la revista CompartiMOSS ha cumplido su numero 50. Para esto los
A linear model is a sum of weighted variables that predict a target output value given an input data instance.
For example: car prices.
A car has different features like: year built, horse power, trunk capacity, etc.
With linear regression the idea is to find the linear formula which take into account those features and predict the car price:
EG:
Y(price) = 10000 + (Current Year - Year Built)*108 + 23*Trunk Capacity - NrOfAccidents
I try to not explain with advanced mathematics, just plain english and I hope the explanation above is sufficient for most readers.
Show me the code
As before let's use the sklearn to generate a synthetic dataset.
from sklearn.datasets import make_classification, make_blobs
from matplotlib.colors import ListedColormap
from sklearn.datasets import load_breast_cancer
cmap_bold = ListedColormap(['#FFFF00', '#00FF00', '#0000FF','#000000'])
# synthetic dataset for simple regression
from sklearn.datasets import make_regression
plt.figure()
plt.title('Sample regression problem with one input variable')
X_R1, y_R1 = make_regression(n_samples = 1000, n_features=1,
n_informative=1, bias = 150.0,
noise = 30, random_state=0)
plt.scatter(X_R1, y_R1, marker= 'o', s=50)
plt.show()
The code above will also plot the values generated.
Now, if we want to create the ML Model to predict future values, then we use the simple code below.
As always we split the data into train and test data, then we use the LinearRegression from sklearn.linear_model and with the fit method we get the model, which would give us at the further down some useful information.
from sklearn.linear_model import LinearRegression
X_train, X_test, y_train, y_test = train_test_split(X_R1, y_R1,
random_state = 0)
linreg = LinearRegression().fit(X_train, y_train)
print('linear model coeff (w): {}'
.format(linreg.coef_))
print('linear model intercept (b): {:.3f}'
.format(linreg.intercept_))
print('R-squared score (training): {:.3f}'
.format(linreg.score(X_train, y_train)))
print('R-squared score (test): {:.3f}'
.format(linreg.score(X_test, y_test)))
One widely used method for estimating w and b for linear aggression problems is called least-squares linear regression, also known as ordinary least-squares.
Least-squares linear regression finds the line through this cloud of points that minimizes what is called the means squared error of the model.
The mean squared error of the model is essentially the sum of the squared differences between the predicted target value and the actual target value for all the points in the training set.
Visually speaking its the distance between the predicted value and the true value.
So each of these can be computed as the square difference can be computed, and then if we add all these up, And divide by the number of training points, take the average, that will be the mean squared error of the model.
Adding up all the squared values of these differences for all the training points gives the total squared error and this is what the least-square solution tries to minimize.
The code above prints out the following, where we get the R-squared score which we just discussed.
Results
linear model coeff (w): [45.71]
linear model intercept (b): 148.446
R-squared score (training): 0.679
R-squared score (test): 0.492
And now lets plot the linear regression line
plt.figure(figsize=(5,4))
plt.scatter(X_R1, y_R1, marker= 'o', s=50, alpha=0.8)
plt.plot(X_R1, linreg.coef_ * X_R1 + linreg.intercept_, 'r-')
plt.title('Least-squares linear regression')
plt.xlabel('Feature value (x)')
plt.ylabel('Target value (y)')
plt.show()
And the result is below
The result is a straight line which corresponds to a linear formula, where we can enter future input values, and the line will give us the predicted value.
Side notes:
Any question just write them in the comments section below