visit
Leave-one-out Cross-validation (LOOCV) is one of the most accurate ways to estimate how well a model will perform on out-of-sample data. Unfortunately, it can be expensive, requiring a separate model to be fit for each point in the training data set. For the specialized cases of ridge regression, logistic regression, Poisson regression, and other generalized linear models, though, Approximate Leave-one-out Cross-validation (ALOOCV) gives us a much more efficient estimate of out-of-sample error that’s nearly as good as LOOCV.
Suppose we’re fitting a model to a data set of n feature vectors:
and n associated target values
Let X denote the matrix of feature vectors and y denote the vector of target values. With leave-one-out cross-validation, we fit n different models.
For each data entry i we form a new data set:
consisting of the original data set with the ith entry removed. Then we fit our model and measure how well it predicts the ith target value. This gives us a leave-one-out error for the ith entry:
Averaging these errors across all data points then gives us an estimate of the out-of-sample error.
Research has repeatedly shown LOOCV to be more accurate than other forms of k-fold cross-validation for estimating out-of-sample error [1]. But LOOCV is really expensive. It requires us to fit many more models than a 3 or 10 fold cross-validation.
Whenever we fit a logistic regression model, we have a regularization parameter C that we need to tune.
C acts as a dial that controls the complexity of the model: If we set C too low, our model won’t take full advantage of the training data; but if we set C too high, it will overfit the training data and perform poorly on out-of-sample data.
Using ALOOCV, we can estimate how well logistic regression will perform for any given value of C.
Let’s plot out ALOOCV across a range of different C values. The Iris data set is small enough that it’s possible to compute LOOCV by brute force, so we’ll plot that out also so that we can see accurate ALOOCV is.
pip install bbai
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
X, y = load_iris(return_X_y=True)
X = StandardScaler().fit_transform(X)
X = np.hstack((X, np.ones((X.shape[0], 1)))) # add a bias column
Here’s how we can compute ALOOCV given a value of C
import bbai.glm
def compute_aloocv(C):
model = bbai.glm.LogisticRegression(C=C, fit_intercept=False)
model.fit(X, y)
return model.aloocv_
from sklearn.model_selection import LeaveOneOut
def compute_loocv(C):
ll_sum = 0
for train_indexes, test_indexes in LeaveOneOut().split(X):
X_train = X[train_indexes]
y_train = y[train_indexes]
X_test = X[test_indexes]
y_test = y[test_indexes]
model = bbai.glm.LogisticRegression(C=C, fit_intercept=False)
model.fit(X_train, y_train)
pred = model.predict_log_proba(X_test)
ll_sum += pred[0][y_test[0]]
return -ll_sum / len(y)
import matplotlib.pyplot as plt
Cs = np.logspace(0.5, 2.5, num=100)
aloocvs = [compute_aloocv(C) for C in Cs]
loocvs = [compute_loocv(C) for C in Cs]
plt.plot(Cs, aloocvs, label='ALOOCV')
plt.plot(Cs, loocvs, label='LOOCV')
plt.xlabel('C')
plt.xscale('log')
plt.ylabel('Cross-Validation Error')
plt.legend()
plt.savefig('iris_cv.svg')
To select a value of C, we could quickly test different values of C and pick the one with the best ALOOCV value. But we can do much better.
ALOOCV isn’t just efficient to compute for a hyperparameter; it’s also possible to efficiently compute the first and second derivatives of ALOOCV with respect to hyperparameters [3]. Thus, we can apply a second-order optimizer to very quickly dial into the exact value of C that optimizes ALOOCV.
model = bbai.glm.LogisticRegression(fit_intercept=False)
# Note: when we don't provide a value for C, bbai.glm.LogisticRegression
# will apply an optimizer to find the value of C with the best ALOOCV
model.fit(X, y)
print("C_opt = ", model.C_)
C_opt = 67.38021801069182
def compute_loocvs(C):
cvs = []
for train_indexes, test_indexes in LeaveOneOut().split(X):
X_train = X[train_indexes]
y_train = y[train_indexes]
X_test = X[test_indexes]
y_test = y[test_indexes]
model = bbai.glm.LogisticRegression(C=C, fit_intercept=False)
model.fit(X_train, y_train)
pred = model.predict_log_proba(X_test)
cvs.append(-pred[0][y_test[0]])
return cvs
n = len(y)
aloocvs = model.aloocvs_
loocvs = compute_loocvs(model.C_)
indexes = list(range(n))
indexes = sorted(indexes, key=lambda i: -aloocvs[i])
aloocvs = [aloocvs[i] for i in indexes]
loocvs = [loocvs[i] for i in indexes]
ix = list(range(n))
plt.plot(ix, aloocvs, marker='x', label='ALOO', linestyle='None')
plt.plot(ix, loocvs, marker='+', label='LOO', linestyle='None')
plt.ylabel('Leave-one-out Error')
plt.xlabel('Data Point')
plt.legend()
plt.savefig('iris_loo.svg')
Looking at this graph, we could choose a cutoff point and select points to examine further.
Complete code from this blog can be found at .