Linear Regression
Introduction to linear regression and the normal equation
best to know calculus and linear algebra
this very well may be riddled with tiny mistakes
Introduction
Linear regression is a simple machine learning algorithm where the machine learns to find a function that can map an input to an output. This is especially useful for predicting outcomes.
Thus, the goal of linear regression is to find a function (we call it a hypothesis, ) such that with given inputs (or features) and outputs , .
To find an equation for , recall that the standard equation of a line is defined as
where is the slope of the line and is the intercept. However, datasets in ML tend to not be a singular , but rather many features to output(s)
where there are features. Additionally, rather than having a slope , we have a parameter .
Thus, in linear regression, we get the following equation for our hypothesis
The learning function is trying to minimize the loss, which we define as
We can use gradient descent (I'll cover this in a separate entry) or the normal equation, which I will cover here.
Normal Equation
We define the design matrix as
Notice how is the design matrix, , multiplied by the parameters, .
Because (definition of dot product), we obtain
In order to minimize , we must have . This is the gradient of with respect to , and when the function is not decreasing anymore, this gradient is zero.
Given the definition of , we compute the gradient as follows:
And thus, we achieve the normal equation:
Implementation
Considering the normal equation is so simple, the implementation is too.
One thing to note is that we concatenate a column of s to the training data. This is the intercept.
Remember that
The column of s is the coefficient on the parameter.
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
import time
def normal_equation(X, y):
X_t = np.transpose(X)
return np.dot(np.linalg.pinv(np.dot(X_t, X)), np.dot(X_t, y))
X, y = make_regression(n_samples=10000, n_features=100, noise=1.0, random_state=1)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=1
)
np.save("X_test.npy", X_test)
np.save("y_test.npy", y_test)
X_train_np = np.c_[np.ones((X_train.shape[0], 1)), X_train]
start = time.time()
theta = normal_equation(X_train_np, y_train)
np.save("theta_normal.npy", theta)
print(f"Normal equation complete in {time.time() - start}s")
We store the testing data to use in the testing code.
Testing
We test our model with the previously stored testing data.
import numpy as np
from sklearn.datasets import make_regression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
X_test = np.load("X_test.npy")
y_test = np.load("y_test.npy")
X_test_np = np.c_[np.ones((X_test.shape[0], 1)), X_test]
theta = np.load(f"theta_normal.npy")
# prediction
pred = np.dot(X_test_np, theta)
# check accuracy
mae = mean_absolute_error(y_test, pred)
mse = mean_squared_error(y_test, pred)
r2 = r2_score(y_test, pred)
print("MAE: ", mae)
print("MSE: ", mse)
print("R^2: ", r2)
Conclusion
Linear regression is a rather straightforward machine learning algorithm, and can be easily implemented with the normal equation.
The normal equation is able to take an input and output and find parameters with some simple linear algebra, demonstrating linear algebra's vast use in the field of machine learning.