44,850 reads

My Notes on MAE vs MSE Error Metrics 🚀

by Sengul KaraderiliMarch 11th, 2022

Too Long; Didn't Read

We will focus on MSE and MAE metrics, which are frequently used model evaluation metrics in regression models. MAE is the average distance between the real data and the predicted data, but fails to punish large errors in prediction. MSE measures the average squared difference between the estimated values and the actual value. L1 and L2 Regularization is a technique used to reduce the complexity of the model. It does this by penalizing the loss function by regularizing the function of the function.

Companies Mentioned

featured image - My Notes on MAE vs MSE Error Metrics 🚀

This post contains my notes on error metrics.

Contents:

Linear Regression Summary
MAE
MSE
Compare MAE vs MSE
Bonus: L1 and L2 Regularization
Experiment Lab
Bonus! If we want to compare MAE and RMSE
Sources

Linear Regression Summary

In linear regression:

Assumptions of Linear Regression 💫

1. Normal distribution of residuals

Normality of residuals. The residuals should be normally distributed.

2. Linearity of residuals

There are basically 2 classes of dependencies

3. Equal variance of residuals

Mean Absolute Error (MAE)

Steps of MAE:

Find all of your absolute errors, xi – x.
Add them all up.
Divide by the number of errors. For example, if you had 10 measurements, divide by 10.

Mean Square Error (MSE)

Steps of MSE:

Calculate the residuals for every data point.
Calculate the squared value of the resilduals.
Calculate the average of residuals from step 2.

Compare Them

MAE:

The idea behind the absolute error is to avoid mutual cancellation of the positive and negative errors.
An absolute error has only non-negative values.
By the same token, avoiding the potential of mutual cancelations has its price – skewness (bias)cannot be determined.
Absolute error preserves the same units of measurement as the data under analysis and gives all individual errors the same weights (as compared to squared error).
This distance is easily interpretable and when aggregated over a dataset using arithmetic mean has a meaning of the average error.
The use of absolute value might present difficulties in the gradient calculation of model parameters. This distance is used in such popular metrics as MAE, MdAE, etc.

MSE:

The squared error follows the same idea as the absolute error – avoid negative error values and mutual cancellation of errors.
Due to the square, large errors are emphasized and have a relatively greater effect on the value of the performance metric. At the same time, the effect of relatively small errors will be even smaller. Sometimes this property of the squared error is referred to as penalizing extreme errors or being susceptible to outliers. Based on the application, this property may be considered positive or negative. For example, emphasizing large errors may be a desirable discriminating measure in evaluating models.
In case of data outliers , MSE will become much larger compared to MAE. Avoiding the potential of mutual cancelations has its price – skewness (bias)cannot be determined (for MAE).
In MSE, error increases in a quadratic fashion while the error increases in a proportional fashion in MAE.
In MSE since the error being squared, any prediction error is being heavily penalized.


# Code Comparison 

# true: Array of true target variable
# pred: Array of predictions

    
def calculateMAE(true, pred):
  return np.sum(np.abs(true - pred))

def calculateMSE(true, pred): 
  return np.sum((true - pred)**2)

MAE and MSE with Different Models

We can look at these examples to compare models. However, it may not make sense to compare metrics in different models at this time:

Bonus: L1 and L2 Regularization

Regularization: is a technique used to reduce the complexity of the model. It does this by penalizing the loss function.

L1 or Manhattan Norm:

A type of regularization that penalizes weights in proportion to the sum of the absolute values of the weights. In models relying on sparse features, L1 regularization helps drive the weights of irrelevant or barely relevant features to exactly 0, which removes those features from the model. L1 loss is less sensitive to outliers than L2 loss.

L2 or Euclidian Norm:

A type of regularization that penalizes weights in proportion to the sum of the squares of the weights. L2 regularization helps drive outlier weights (those with high positive or low negative values) closer to 0 but not quite to 0. L2 regularization always improves generalization in linear models.

L2 and L1 penalize weights differently:
L2 penalizes weight2.L1 penalizes |weight|.

Experiment Lab ⚗️🧪🌡📊📉📈🔍

Let's see how metrics work on outlier and non-outlier data.

# Import part.
import numpy as np 
import pandas as pd

from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error
import statsmodels.api as sm

# First actual values.
actual  = np.random.randint(low=50, high=101, size=(50))
# Seconda my random pred data.
pred = np.random.randint(low=50, high=101, size=(50))
print("Actual data (Random):", actual)
print("Pred data (Random):", pred)

Out[]:
Actual data (Random): [ 53  95  63  78  88  59  96  86  52  71  78  89  77  60  97  79  71  87
  55  92  69  76  80  66  80  88  89  68  69  98 100  57  83  72  82  72
  52  78  94  76  69  59  73  70  99  97 100  63  73  94]
Pred data (Random): [ 66  69  65  75  99 100  88  92  83  77  80  58  85  91  78  80  63 100
  55  84  64  85  67  87  79  83  59  81  76  85  96  86  87  99  91  84
  81  50  96  98  76  99  55  63  67  74  51 100  55  75]

import matplotlib.pyplot as plt

# create scatter plot
plt.plot(actual, pred, 'o')
# m = slope, b=intercept

m, b = np.polyfit(actual, pred, 1)

plt.plot(actual, m*actual + b)

Out[]:

mae = mean_absolute_error(actual, pred)
print("MAE without outliers:", mae)
mse = mean_squared_error(actual, pred)
print("MSE without outliers:", mse)

Out[]:
MAE without outliers: 16.02
MSE without outliers: 408.1

pred[[4,8,15,45]] = pred[[4,8,15,45]] + 50

# create scatter plot
plt.plot(actual, pred, 'o')
# m = slope, b=intercept

m, b = np.polyfit(actual, pred, 1)

plt.plot(actual, m*actual + b)

Out[]:

mae = mean_absolute_error(actual, pred)
print("MAE with outliers:", mae)
mse = mean_squared_error(actual, pred)
print("MSE with outliers:", mse)

Out[]:
MAE with outliers: 19.1
MSE with outliers: 648.1

Bonus! If we want to compare MAE and RMSE

RMSE: It represents the sample standard deviation of the differences between predicted values and observed values (called residuals).

Case 1: Actual Values = [2,4,6,8] , Predicted Values = [4,6,8,10]
MAE for case 1 = 2.0, RMSE for case 1 = 2.0
Case 2: Actual Values = [2,4,6,8] , Predicted Values = [4,6,8,12]
MAE for case 2 = 2.5, RMSE for case 2 = 2.65

Contact me if you have questions or feedback: [email protected] 👩‍💻

Sources

statisticshowto:
statisticshowto:
MLCC:
Choosing the right metric:
Stack exchange:
Makine Öğrenmesi Algoritmaları ile Hava Kirliliği Tahmini Üzerine Karşılaştırmalı Bir Değerlendirme:
Multiple LR:
MLCC: