visit
This post contains my notes on error metrics.
Contents:
In linear regression:
y’ is the predicted label (a desired output) b is the bias (the y-intercept)w1 is the weight of featurex1 is a feature (a known input)
1. Normal distribution of residuals
Normality of residuals. The residuals should be normally distributed.
2. Linearity of residuals
The regression model is linear in parameters. The mean of residuals is zero. Independence of residualsThere are basically 2 classes of dependencies
Residuals correlate with another variable. Multicollinearity is a fancy way of saying that your independent variables are highly correlated with each other.Residuals correlate with other (close) residuals (autocorrelation). No autocorrelation of residuals. This is applicable especially for time series data. Autocorrelation is the correlation of a Time Series with lags of itself.3. Equal variance of residuals
Homoscedasticity is present when the noise of your model can be described as random and the same throughout all independent variables. Again, the mean of residuals is zero.Steps of MAE:
Steps of MSE:
MAE:
MSE:
# Code Comparison
# true: Array of true target variable
# pred: Array of predictions
def calculateMAE(true, pred):
return np.sum(np.abs(true - pred))
def calculateMSE(true, pred):
return np.sum((true - pred)**2)
MAE and MSE with Different Models
We can look at these examples to compare models. However, it may not make sense to compare metrics in different models at this time:
Regularization: is a technique used to reduce the complexity of the model. It does this by penalizing the loss function.
L1 or Manhattan Norm:
A type of regularization that penalizes weights in proportion to the sum of the absolute values of the weights. In models relying on sparse features, L1 regularization helps drive the weights of irrelevant or barely relevant features to exactly 0, which removes those features from the model. L1 loss is less sensitive to outliers than L2 loss.
L2 or Euclidian Norm:
A type of regularization that penalizes weights in proportion to the sum of the squares of the weights. L2 regularization helps drive outlier weights (those with high positive or low negative values) closer to 0 but not quite to 0. L2 regularization always improves generalization in linear models.
# Import part.
import numpy as np
import pandas as pd
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error
import statsmodels.api as sm
# First actual values.
actual = np.random.randint(low=50, high=101, size=(50))
# Seconda my random pred data.
pred = np.random.randint(low=50, high=101, size=(50))
print("Actual data (Random):", actual)
print("Pred data (Random):", pred)
Out[]:
Actual data (Random): [ 53 95 63 78 88 59 96 86 52 71 78 89 77 60 97 79 71 87
55 92 69 76 80 66 80 88 89 68 69 98 100 57 83 72 82 72
52 78 94 76 69 59 73 70 99 97 100 63 73 94]
Pred data (Random): [ 66 69 65 75 99 100 88 92 83 77 80 58 85 91 78 80 63 100
55 84 64 85 67 87 79 83 59 81 76 85 96 86 87 99 91 84
81 50 96 98 76 99 55 63 67 74 51 100 55 75]
import matplotlib.pyplot as plt
# create scatter plot
plt.plot(actual, pred, 'o')
# m = slope, b=intercept
m, b = np.polyfit(actual, pred, 1)
plt.plot(actual, m*actual + b)
Out[]:
mae = mean_absolute_error(actual, pred)
print("MAE without outliers:", mae)
mse = mean_squared_error(actual, pred)
print("MSE without outliers:", mse)
Out[]:
MAE without outliers: 16.02
MSE without outliers: 408.1
pred[[4,8,15,45]] = pred[[4,8,15,45]] + 50
# create scatter plot
plt.plot(actual, pred, 'o')
# m = slope, b=intercept
m, b = np.polyfit(actual, pred, 1)
plt.plot(actual, m*actual + b)
Out[]:
mae = mean_absolute_error(actual, pred)
print("MAE with outliers:", mae)
mse = mean_squared_error(actual, pred)
print("MSE with outliers:", mse)
Out[]:
MAE with outliers: 19.1
MSE with outliers: 648.1
RMSE: It represents the sample standard deviation of the differences between predicted values and observed values (called residuals).
Contact me if you have questions or feedback: [email protected] 👩💻